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PART I 



INTRODUCTION 

During the spring of 1967, the Committee on Educational Re- 
search, University of South Carol ina, began a long-term investigation 
of problem-solving ability in young children. The study was funded 
by Project Head Start and is now in its fourth year of data collection 
and analysis . 

The study was pi anned to have specific application to certain 
critical problems in the field of childhood education a= well as 
more general implications for educational theory and practice. 

Among the immediate goal s of the study was the discovery of more 
effective means of describing the progress of various sun-popui ations 
of children with respect to problem-solving abilities. Among the 
long-term goal s of the study wet the devei opment of improved testing 
and measurement techniques and effective curriculum str; '" r ■*- 

on these descriptions. 

The present document is an initial report of findings result- 
ing from the study and includes a description of the problem 
addressed, the readiness context for the investigation, . he research 
question and nrocedures, analysis of the data, conclusions and impli- 
cations, and recommendations. The several appendices ct ltain pro- 
cedural information, analysis tables, and supplementary data. 
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THE PROBLEM 

The present wide-spread interest in the development and eval- 
uation of curricula for pre-school educational programs is a rel- 
atively new phenomenon in American society. The importance of early 
learning has generally been recognized by learning theorists, but 
the impetus needed for the extensive research necessary in construct- 
ing and testing efficient curricula was lacking before the mid 1960’s. 
The focusing of social consciousness on the plight of the disad- 
vantaged child at that time has resulted in great activity in the 
field during the past four or five years. As Merwin has written: 

The third new area which has prompted a good deal 
of evaluation activity has been that of ^arly 
childhood education. *-• reasing amount oi re- 

search which poxiiLs to the severe handicap of 
children who enter school without a prior stim- 
ulating environmental experience has centered 
much attention on the young child. In the past, 
designers of educational evaluation, as a rule, 
have paid little attention to children under the 
traditional school age. However, when such fed- 
eral projects as Head Start and various programs 
sponsored by the Office of Economic Opportunity 
called for work with children of preschool ages, 
they prompted a flurry of activity in attempts to 
do ttie kind of evaluation that was needed as a 
basis for planning meaningful educational activ- 
ities for youngsters in the age group.-*- 



-Clack C. Meirwin, ’’Historical Review of Changing Concepts of 
Evaluation,” Edutcational Evaluation: New Roles, New Means . Sixty- 

eighth Yearbook of the National Society for the Study of Education, 
Part II (Chicago: University of Chicago Press, 1969), p. 20. 
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The more or less sudden implementation of numerous programs for 
young children emphasized many areas of disagreement in the field 
as well as a sparsity of critical information. In general, the goals 
to be attained by pre-school education have not been clarified; the 
content of curricula is uncertain; and measurement instruments and 
strategies do not seem dependable. In a word — and theory notwith- 
standing- -relatively little is known about the manner in which 
mental development occurs in young children. 

The fact becomes readily apparent as efforts are made to eval- 
uate the effectiveness of various intervention programs. All too 
often anticipated movement on significant dependent variables has not 
been detected. Programs that would seem on the basis of face validity 
to make a difference in the intellectual development of children can- 
not be shown on the basis of empirical evidence to have done so. 

Some ha\ _ viewed this as currier: 1 um deficiencies, they have not be- 
lieved the curricula to be appropriate, whatever the apparent validity. 
Others have blamed the results on measurement deficiencies. The 
latter have contended that existing or newly-developed instruments 
are simply not sensitive enough or that they have been standard- 
ized on populations different from those being studied. 

However one views the vario\is problems associated with early 
childhood education, one thing seems true: we are not yet able to 

describe adequately mental development in the early years. By use 
of the word, "adequately , " the present writers mean with sufficient 
validity and precision to give fruitful direction and specificity 
to the work already done and being done in curriculum, instruction, 
and evaluation. 
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The problem area, therefore, addressed by the present investi- 
gators was that of describing mental development — specifically, 
problem-solving abilities- -in young children. Naturally, the 
inquiry would address the traditional readiness concept but readi- 
ness identified through an extensive, inductive-empirical approach. 
In other words, the initial goal would be to operationalize readi- 
ness behaviors. 

Within the framework of readiness, two considerations were con- 
sidered of primary importance. These had to do with the comparing 
and contrasting of defined subpopulations. On the one hand, there 
was the identification of similarities in development for different 
subpopulations; and on the other hand, there was the identification 
of differences in development between and among subpopulations of 
children. Obviously such information would have important impli- 
cations for both curriculum and evaluation. 

At this point, the present investigators made explicit their 
view of the readiness concept with definitions and directional as- 
sumptions. The position which serves as the context for the pre- 
sently reported, research is the subject of the following section. 
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PART III 



READINESS: THE RESEARCH CONTEXT 

The General View of Readiness 
The notion that learning takes place most effectively and 
efficiently when instruction is introduced at the appropriate 
time is well established among educators and psychologists. 

There is general agreement as to the importance of identify- 
ing "readiness 17 points for a particular learner with respect to 
specific tasks or skills to be taught. Thus, there is little 
argument regarding the general idea of readiness at least as 
a hypothetical point on some underlying continuum- -and teachers 
are exhorted to capitalize on "teachable moments.” 

On the other hand, controversy arises when one moves past 
such definition-derived statements as, "The concept of readiness 
simply refers to the adequacy of existing capacity in relation 
to the demands of a given learning task"^ and attempts to identify 
more usefully the concept of readiness. In the matter of delin- 
eating causal factors related to readiness or defining readiness 
points for particular activities, positions vary considerably. 



2 David P. Ausubel , "What Shall the Schools Teach? Viewpoints 
from Related Disciplines: Human Growth and Development," Teachers 
College Recoi^d , LX (February, 1959) , 247 . 

€> 
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Views range from the position that readiness for learning depends 
entirely upon biological growth (which can only come with the 
passage of time) to multi-dimensional positions which include all 
facets of the learner and his environment. 

Enumeration of specific traits and influences that are be- 
lieved to determine a learner’s readiness for particular learning 
would include many items: physical, social, emotional, mental, 

and so on. The grouping of these specific correlates to readiness 
into meaningful determinants has been a somewhat arbitrary matter, 
but classifications generally have grouped them into the two cate- 
gories of maturation and experience. 

Maturation has been defined as a process which depends upon 
biological rather than experiential factors- Thus viewed, matu- 
ration is that development which TT . . .takes place in the demon- 



strable absence of specific practice experience ... those that are 



attributed to genic influences and/or incidental experience.” 

i 

It is believed that this development . . occurs practically in- 
dependent of outside stimulation. t,l+ McCandless has described the 
process as ”...a neuro-physiological-biochemical change from con- 



ception to death. . -which occurs as a function of time or age. 

In general, research into the influence of maturation upon 
readiness has employed one or both of two general strategies. 



(New 



ER | c H°lt 



3 Ibid. 

m. Johnson, Psychology: A Problem-solv ing Approach 

York: Harper & Brothers, 1961), p. 12. 

3 Boyd R< McCandless, Children and Adolescents (New York: 
, Rinehart and Winston, 1961), p. 118. 
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the first case, the learner is restricted in practice or deprived 
of relevant experiences. In the second strategy, practice or ex- 
perience is introduced to the learner at an earlier age than normal. 

The majority of studies employing the restriction of practice 
or the deprivation of experience have used animals as subjects, and 
they have uniformly demonstrated that restriction may cause per- 
manent impairment if the restriction is prolonged beyond a critical 
period. The phenomenon of imprinting is related to the concept of 
critical periods in maturation. Information related to restriction 
and deprivation impairment in humans is very limited and comes from 
accounts of ’’wild children” reared in isolation from human contacts 
and from accounts of infants reared during their first few years 
without appropriate psychological stimulation. 

In some contrast, numerous studies have been conducted with 
children as subjecbs to determine the effects of early practice 
upon functions normally acquired at a later time in the child’s 
life. The results of these studies generally support the importance 
of added maturation that comes with passage of time and the inef- 
fectiveness of early practice. Studies of this type have led to 
the acceptance by many educators of the ’’delaying doctrine" with 
respect to both motor skills and cognitive processes. They argue 
that if maturation implies a gradual, biological unfolding, in- 
dependent of learning and practice; there is little a teacher can 
do but await some outward manifestation which presumably signifies 
that the pupil has attained a given maturity level. 

Although chronological age and school grade level have both 
been used as general referents of mental maturation, the most 

ERjt 
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effective methods of measuring mental maturity have centered on 
the concept of mental age as determined by means of intelligence 
tests. In reading, for example, estimates have been made on the 
basis of experimental studies that the optimum-minimum mental age 
for beginning to read is six and one-half years. Arithmetical 
topics have also been assigned to specific mental ages: "Multi- 

plication facts should not be taught below a mental age level of 
eight years, four months — 

The second category of causal or determinant influences on 
readiness is that of experiences. The great emphasis on pre- 
school education in recent years (Head Start, for example) re- 
flects the* importance that educators and the general public have 
placed on this aspect of the readiness concept. Wxth respect to 
readiness for reading, Russell has written: 

The teacher cannot just wait for readiness to be 
achieved. General maturation is important, but 
the teacher must also provide experiences which 
contribute to the growth of reading readiness . 1 

Harris indicates that reading readiness is dependent in part on a 

child T s biological growth and in part on his learning experiences. 8 



6 Carleton W. Washburne, "The Grade Placement of Arithmetic 
topics: A Committee of Seven Investigation,” Report of the 

Society T s Committee on Arithmetic , 29th Yearbook of the NSSE, 

Part II (Chicago: University of Chicago Press, 1930), p. 656. 

7 David H. Russell, Children Learn to Read , (2nd ed.) (Waltham, 
Massachusetts: Blaisdell Publishing Company, 1961), p. 169. 

8 Albert J. Harris, Effective Teaching of Reading (New York: 
David McKay Company, Inc., 1962), p. 22. 
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And Ausubel states: 

Whether or not readiness exists does not necessarily 
depend on maturation alone but in many instances is 
solely a function of prior learning experience and 
most typically depends on varying proportions of 
maturation and learning. 

The notion that prior learning experiences is a vital aspect of 
the readiness concept has been demonstrated, of course, since the 
beginning of graded textbooks and materials. Logically, the learn- 
ing of certain materials requires that the learner has become 

✓ 

familiar with less complex but related ideas. Gagne has advanced 
this notion, explicitly, with his concept of task analysis in the 
construction of curriculum. 

The foregoing discussion has been presented in order to outline 
the general view of readiness held by educators and psychologists 
at the present time. With the exception of the work being done by 
Gagne' and others working along similar lines, the concept of readi- 
ness has not been operationalized in a fashion that has made it of 
extensive empirical value. That is to say, our knowledge of readi- 
ness has not been greatly productive in advancing the practice and 
understanding of education- 

An Operational View of Readiness 
Xn approaching the problem of readiness, the present investi- 
gators began with two assumptions that are commonplace and generally 




^David P. Ausubel, ,T What Shall the Schools Teach? Viewpoints 
from Related Disciplines: Human Growth and Development,” Teachers 
College Record , LX (February, 1959), 24-8. 



lORobert M. Gagne', ’’Curriculum Research and the Promotion of 
1 ea minor ’’ Perspectives of Curriculum Evaluation, AERA Monograph 
Series on C^iculum Evaluation (Chicago: Rand McNally & Company, 

1967) , X, PP. 20-23. jq 



The first of these is 



- 10 - 



accepted by educators and psychologists, 
that the appearance of problem-solving skills in an individual is 
patterned such that uniquely related skills appear in an easy-to- 
hard sequence in which the ability to perform a given task occurs 
prior to the performance of certain more complex tasks. In other 
words, these skills appear in definable types and in common se- 
quences from easy to difficult within types and across individuals. 
The second assumption is that the appearance of these skills is 
a function of both time (maturation) and experience (learning) . 

The two assumptions naturally led to the consideration of 
readiness in terms of a two-dimensional matrix in which the hori- 
zontal axis represented types of related skills (e.g., word flu 
ency, number ability) and the vertical axis represented the sequence 
of appearance of the skills (easy- to-hard, e.g., addition, sub- 
traction, multiplication, division) . If one then could describe 
the entries in the matrix--the problem-solving skills--in suf- 
ficiently operational terms, then extensive, empirical research 
might lead to a specific body of information related to readiness 
which could be applied in a practical fashion to instuctiori and 

evaluation . 

Of course, the idea of describing readiness or mental develop- 
ment with a two-dimensional matrix of "traits ' 1 and levels of 
traits was hardly original. But the possibility of operational- 
izing entries within the cells of the matrix, if awkward or arti- 
ficial assumptions could be avoided, appeared to be a very fruitful 




direction for inquiry. 
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The present investigators then determined that each entry in 
the matrix would be a description of a unit of behavior or a type 
of task which an individual either could or could not perform. The 
behavior would be defined and delimited in terms of a type of problem 
that the individual would be instructed to solve. Examples of such 
problems might be: (1) close ti e door and return to your seat, (2) 

add five and three, (3) what color is the dress? and (4) compute the 
hypotenuse of a right triangle. As the matrix would be developed 
through an empirical investigation, it would not be necessary to 
make an assumption concerning what "type” of functioning was required 

for solving a particular problem. 

If a large number of problems sufficiently varied in types and 

levels of difficulty could be presented to a large population of 
individuals sufficiently varied in levels of mental development, 
it might be possible to analyze the responses of the individuals 
in such a way that horizontal 'trait) categories might be formed 
and the problems arranged within the categories in a easy-to-hard 
sequence. Upon completion, the matrix would be an operational 
profile of problem-solving development in which the development 
sequence of skills would circumscribe readiness levels. Not only 
would the profile provide an operational approach to readiness, 
but the inductive and empirical nature of the profile could be 
expected to be of considerable heuristic value. 

The problems associated with such a line of inquiry would be 
numerous and many of them were immediately apparent. First, the 
selection of appropriately varied tasks to be included would not 

O 
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be easy. Every effort must be made to see that they were as rep- 
resentative of a universe of cognitive and psychomotor problems 
as possible. 

Secondly, the method of administering the problems to indi- 
viduals must be such that each one could be scored as either an 
absolute pass or fail with the degree of testing error lowered to 
a minimum. Finally, a method of analysis must be identified or 
constructed that would be appropriate for treating dichotomous data 
in a manner that would result in clusters of scaled tasks without 
reference to a coordinate variable such as age. 

At this point, the methodological problems were becoming at 
least apparent if not soluble. But there were still major con- 
ceptual problems. First, it was necessary to define what was meant 
by readiness. Continuing to emphasize the operational nature of 
the inquiry, readiness was defined in the following manner: A 

readiness behavior is a unit of behavior that an individual p erforms 
prior to performing another given unit of b ehavior,. Further, the 
identification and description of a given readiness behavior was 




posited as desirable because it precedes the achievement of some 
objective or goal unit of behavior. An example of a readiness be- 
havior might be the selection of the color red prior to performing the 
task: "Paint the house red.” The point here is that a readiness be- 

havior is always defined in terms of readiness for what? Once the 
what, or goal behavior is defined, then those behaviors that precede 
it (by empirical test) are readiness behaviors. When these are 
sequenced, an investigator theoretically could identify the sequence 

13 
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of readiness behaviors to some goal unit of behavior as well as 
assess the readiness level of a particular individual with respect 
l o the goal behavior. From a practical viewpoint, the validity of 
the identification of readiness levels in an empirical investigation 
would depend upon the inclusion of an appropriately varied (in terms 
of mental development) population, a precise method of measurement, 
and a highly sensitive and sophisticated analytic technique. The 
extent of readiness identification with respect to various gosl be- 
haviors would depend on the variety of tasks (in terms of both type 
and difficulty levels) included in the investigation. 

In view of the definition for a readiness behavior offered above 
it is important to note that one unit of behavior may precede another 
unit of behavior for any one of at least three reasons. First, it 
may be inherent in the organism that he learn one thing before anothe 
Secondly, the necessity of learning one thing before another may be 
inherent in the subject matter (one must be able to count before go- 
ing on to other mathematical operations) . Finally certain behaviors 
may precede others in the development of a child because the culture 
in which the child lives presents experiences in a particular order. 
Therefore the readiness definition does not posit that one unit of 
behavior must precede another in order to be identified as a readines 
level for that behavior; it is only defined as a behavior that does 
precede it. 

A second conceptual problem was the naivete of the two-dimen- 
sional matrix in the first place. Even without the assistance of 
important theories and major research endeavors, simple speculation 
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would lead to the conclusion that the complexity and efficiency t 
mental d velopment is much too great to be described usefully w_ 
a model so simple. Would it really be possible t c separate mentf i 
traits into exclusive columns of scaled behaviors in a way fiat 
would lead to a useful view of readiness? Is it not possible that 
a given task that might appear in a category of ’’number skil s” ar 
some level is prerequisite for the learning of some task appearing 
under ’’word knowledge” at a higher level? 

The learning hierarchies presented by Gngnel-^- and others work- 
ing along similar lines in curriculum and evaluation appeared to 
offer a much more useful model . Instead of entries in a two di- 
mensional matrix, readiness levels could be described as elements 
of a readiness network in which the members were related on the 
basis of the definition of a readiness behavior (a unit of behavior 
that an individual performs prior to performing another given unit 
of behav ior . ) The concept is relatively simple but takes on impor- 
tant implications as the attempt is made to construct it inductively 
and empirically. The reader will note the similarity of the present 
writer’s position on readiness and that of Gagne’s definition of cur 
riculum: 

A curriculum is a sequence of content units arranged 
in such a wav that the learning of each unit may be 
accomplished as a single act, provided the capabilities 
described by specified prior units (in the sequence - ) 
have already been mastered by the learner . . . .A cur- 
riculum is specified when (1) the terminal objectives 
are stated; (2) the sequence of prerequisite capabil- 
ities is described ; and (3) the initial capabilities 
assumed to be possessed by the student are identified . 12 




llRobert M. Gagne, ’’Curriculum Research and the Promotion of 
Learning,” Perspectives of Curriculum Evaluation , AERA Monograph 
Series on Curriculum Evaluation (Chicago Rand McNally & Companv, 
1967), I, pp. 20-23. 

12 Ibid. 15 
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The present investigators believe that the importance ox con- 
ducting extensive research in the area of readiness behavior can 
hardly be overemphasized. If developmental networks of the kind 
described can be constructed, the impact on education and psychology 
could be considerable. Obviously, if one can plot how this develop- 
ment takes place, it would then be possible to study why it takes 
place in this way; inherent in the organism, the society, etc. It 
appears that a first and necessary step toward this goal is deter- 
mining developmental sequences, the order in which children in the 
nation attain problem-solving skills. Not only would this be the 
initial task, but the identifica cion of these sequences would pro- 
vide useful information in and of themselves. Important insights 
into human development could be expected; a basis would be provided 
for cross-cultural comparisons; relevant data would be provided for 
improving the measurement of problem-solving skills in young dhildren 
and implications for the modification of education curricula may be 
suggested. The eventual attainment of extensive networks would de- 
pend upon this work aside from the immediate usefulness and utility 
of the scaled items so identified. The following section of the 
present report describes the research design and procedures used 
in collecting the data for these scales. 





PART IV 



THE RESEARCH DESIGN AND PROCEDURES 
Rationale 

The present investigation was designed to identify scales in- 
dicative of the development of problem-solving behavior in young 
children- The general question to be addressed was: Do children 

of different backgrounds exhibit similarities in the order of de- 
velopment and levels of achievement of problem-solving behaviors? 

In order to answer the question stated above, it appeared 
necessary to present a large number of children of varied develop- 
mental statuses with a variety of problems — both in terms of types 
and apparent levels of difficulty. These problems or tasks must be 
logically related to those areas generally defined as cognitive or 
psychomotor in nature. If these tasks were administered to chil- 
dren in such a way that the child’s "maximum performance” or best 
effort could be elicited and the tasks were discrete in that the 
child would perform either successfully or unsuccessfully, then the 
analysis of responses would result in meaningful scales representing 
developmental continuums. 

The question of consistency across sub-cultural groups then could 
be answered through appropriate analyses. The possibility would exist 
that certain sequences of tasks (scales) would be consistent across 
sub-groups and represent developmental "universals . ” Others might 



» 
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not be consistent and thus would define in a most meaningful manner 
(for educational purposes) differences among sub-groups. It was 
on the basis of this general rationale that the Committee on Edu- 
cational Research proceeded with the design of the investigation. 




The Problem Tasks 

The first major problem in designing an investigation based 
on the above rationale was that of identifying a large number of 
problem- tasks that could be expected to elicit problem-solving be- 
havior from young children. It was considered particularly impor- 
tant that the approach be as inductive with respect to the selection 
of these tasks as possible. Of critical importance was the necessity 
of the tasks being varied, both with respect to format and content. 

A reasonable approach to the problem appeared to be a review 
of all available tests and procedures for measuring cognitive and 
psycho-motor skills in young children. If items on a given test were 
viewed as tasks independent of other items on the test, it would be 
possible to assemble the necessary array of problem- tasks . To this 
end, more than fifty tests were reviewed by the Committee on Educa- 
tional Research. Outside consultants assisted with the review. 

An item classification outline was developed as the tests were 
reviewed (see Appendix A) . Each item on each of the tests was 
classified according to the type of behavior it appeared to elicit. 
Through this process, it was possible to select the widest variety 
of problem-solving tasks and at the same time avoid extensive dupli- 
cation. See Appendix B for a more detailed statement of the procedure 

H > r7r 
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used to selecting the tests and organizing them into "Batteries." 

At length, items from twenty- two tests were selected for use in 
the investigation. A listing of these tests appears in Appendix 

c. 

Sample Selection 

Three fundamental considerations were paramount in the iden- 
tification and selection of children to be included in the inves- 
tigation, These included the age range of children to be tested, 
the sub-cultural groups to be represented, and the total number of 

children to be utilized. 

With respect to the age range of children to be tested, the 
decision was made to include principally four, five, and six-year 
olds . The position was taken that inasimch as the child would be 
required to respond to verbal instructions in order to accomplish 
the majority of the tasks, that this was a feasible and defensible 
age range to sample. It was also noted that this range could be 
lowered in subsequent studies on the basis of data obtained in the 
present investigstion* 

' In view of the nature of the research rationale, it was also 
necessary to have subjects spread equally across the age range. 

If traits were to be identified and then scaled in order of the 
skills included in each, obviously there must be provisions made 
to insure that traits were being sampled at equal intervals along 
the developmental continua. Thus, it was decided to divide the 
age range of four through six years into three month intervals and 
include the same number of - children in each interval. That 

13 
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say, there would be the same number of children in the age interval 
14.. 0 - 14.3 months as between * 4 - . •+ - 4. 6 months and so on. 

In the matter of subcultural groups to be represented in the 
sample, the decision was made to include "disadvantaged” children 
(as defined by Office of Economic Opportunity guidelines) and 
"advantaged" children as defined as coming from families within 
a specified income range. ^ The two groups were further divided 
into "Northern" and "Southern" with respect to the geographic 
location of the subjects. 

Finally, the total number of children to be included in the 
sample was determined, to some extent, by the minimum number re- 
quired in each of the subcultural groups for meaningful analysis 
and the maximum number considered feasible in view of the extensive- 
ness of the individual items to be administered. The nature and 
size of the sample is represented schematically in Figure 1 below: 





Economic 


Background 




Geographic Location 


Advantaged 


Disadvantaged 


Total 


North 


N=353 

Ages 4.0 - 
6.11 


N=196 

Ages 4.0 - 
6.11 


549 


South 


N-417 

Ages 4.0 - 
6.11 


N-464 
Ages 4.0 - 
6.11 


881 


TOTAL 


770 


660 * 


1,430 



Fig. 1. — Sample Characteristics and Size 



ERIC 



^Advantaged Northern, family income of $8,000 to $22,000 per year 
Advantaged Southern, family income of $6,000 to $15,000 per year. 
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Testing Procedures and Controls 
Once the various tests to be utilized in the investigation 
had been identified and the criteria for the sample established, 
it was necessary to design procedures and field controls that 
could be expected to yield data essentially free of contamination. 

These procedures and controls principally were related to the amount 
and frequency with which subjects would be tested and to the condi- 
tions under which tests would be administered. 

Inasmuch as twenty-two tests finally were chosen to be admin- 
istered, no individual child could be expected to undergo such ex- 
tensive testing in a relatively brief period of time without ex- 
cessive fatigue. On the other hand, if the time were extended past 
a month for the testing of one child, there would be a serious ques- 
tion as to whether or not the data from the collective tests could 
be considered comparable with respect to the developmental continuum. 

In other words, maturity would become a contaminating factor. 

The tests, therefore, were organized into four ’’batteries,” 
each of which was to be administered to one-fourth of the total sample. 
In each sub-cultural group, one-fourth of the children across the 
age range would receive Battery I, one-fourth of the children would 
receive Battery II and so on. The division into batteries was made 
in such a way as to vary the types of tests across batteries and to 
achieve approximately equal administration times (6-7 hours) for 
each battery. 
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In order that some basis for relating items across batteries 
in subsequent studies would exist, two complete tests were desig- 
nated as "anchor" tests to be administered to each child in the 
sample. These were common to all children. The two anchox 1 tests 
were the Stanford- Bine t Intelligence Scale (Binet) and the Wechsler 
Pre-school and Primary Scale of Intelligence (WPPSI) . The Binet 
was selected because it is widely used with pre-school aged chil- 
dren and contains a variety of item types. The WPPSI was selected 
even though it is a relatively re w test (first published in 1966) 
because of its relationship to another well-known and widely used 
test, the Wechsler Intelligence Scale for Children . In addition 
to these, the color items of the Caldwell-Soule Pre-school In- 
ventory were included as anchor items . 

In addition to procedures involving the administrative sched- 
uling of the various tests, a number of control procedures were 
devised to assure consistency of testing conditions and validity 
of the data collected. These procedures with the variables each 
was designed to control are presented In some detail in Appendix D. 
In general, these procedures recmired that each battery of tests 
(including the anchor tests) be administered to the same number 
of children. Anchor tests were to be administered prior to any 
battery tests, the Binet first and the WPPSI second in all cases. 
The order of administering the tests in a given battery was to be 
reversed in the two halves of a sample unit in an attempt to 
counter-balance whatever practice effects might accrue as a child 
was administered the tests in series. 




When feasible only one child was to be tested in any room at 
one time, and no testing session was to exceed ninety minutes per 
day for any child. These two controls were designed respectively 
to minimize interference during the testing situation and to re- 
duce the possibility of fatigue. No child was to be tested more 
than three sessions in a given week, but each child was to be ad- 
ministered the anchor tests and the appropriate battery within one 

month. 

Periodic observations of each tester were made in the field, 
and any deficiencies noted were verified by a second observer and 
remedied without delay. The Committee on Educational Research took 
steps to assure the quality of the data to be collected by training 
all testers to specified criteria and periodically evaluating their 
performance in the field to ascertain that the test administration 
criteria were met continually. See Appendix E for a detailed de- 
scription of procedures used in selecting and training testers. In 
struments used in the routine evaluation of testers in training and 
in the field and the conditions in which the testing took place are 
in Appendix F. Also included in Appendix F are comments from a re- 
port by the Quality Control division concerning the performance of 

a tester in a typical testing situation. 

A third area requiring the development of special procedures 
was the actual administration of the various test items. Each test 
was to be administered to each child on an individual basis, but 
there was a general consensus that disadvantaged youngsters have 
communication problems in this type of situation. The administra- 
tion of items according to the test manual’s specifications perhaps 
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would very often result in a failure to respond because the child 
did not understand the test item. This problem led to the develop- 
ment of what was termed "Maximum Performance Testing." The examiner 
would probe for responses beyond the specifications of the test 
author f s instructions but within the context of the basic intent 
of the item. This procedure was believed to maximize to whatever 
extent was possible the likelihood that the youngster would respond 
if he were capable of responding. The rationale and procedure for 
"Maximum Performance Testing” are presented in Appendix G. 

Once the data from a particular test had been obtained for a 
child, it was immediately scored and recorded on data sheets in 
preparation for transfer to computer cards. Control procedures 
were maintained to insure that the data remained free from scoring 
and clerical error. These procedures are included in some detail 
in Appendix H. 
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ANALYSIS OF THE DATA 
Rationale and Procedures 

The general research, question was concerned with the possibility 
of similarities in the order of development and the levels of achieve- 
ment of problem-solving behaviors in children of different backgrounds. 
For purposes of the analysis of the data, the general question was 
sub-divided into the following more specific questions: (1) Do ad- 

vantaged and disadvantaged children perform similarly with respect 
to the relative order in which they acquire problem-solving behaviors 
and (2) Do advantaged and disadvantaged children perform similarly 
with respect tc average group scores on test item sets designed to 
measure problem-solving behaviors? The latter question was truly a 
subsidiary one since differences in the performance of advantaged 
and disadvantaged youngsters with respect to mean score performance 
is known to be fairly consistently different in favor of the advantaged. 
The wealth of information available in the present study, however, 
was such as to indicate the advisability of a systematic comparison 
through all of the item sets. The former question dealing with the 
relative order in which these behaviors are acquired was the central 
question and served as the basis for the possible identification of 





- 24 - 



common scales. 
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The general strategy of the research required the application 
of an analysis procedure which would result in the production of 
estimates of scaling parameters for items within item sets. These 
scaling parameters would be indicative of the similarity of sequenc- 
ing within advantaged and disadvantaged subpopulations. The identi- 
fication of common sequencing across subpopulations within item sets 
would serve as the basis for the identification of task types which 

would be common for both groups. 

In addition, the problem of more precise measurement of the ef- 
fects of various curriculum intervention techniques was considered. 

It is known that existing measurements often fail to show that ed- 
ucational experiences for young children result in significant move- 
ment on the traits that published instruments purport to measure. 

This is particularly-, true in the case of disadvantaged children. It 
was the view of the present researchers that one of the principal 
reasons for such failure was related to the inadequacy of present 
instruments to locate youngsters with respect to an underlying con- 
tinuum. If the item sets could be scaled within the structure of some 
scaling model so as to produce measurements that were of interval 
scale strength, then the accuracy cf the measurements taken for dis- 
advantaged children might be enhanced and potentially the effects of 
intervention procedures might be better identified. Analysis pro- 
cedures were developed which would be applied to the individual item 

sets in order to achieve the above results. 

The following steps were taken for each of the several item sets, 
p; -'St, the item sets were subjected to the scaling model analysis 
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separately for advantaged and disadvantaged children. (See The 
Analysis Procedures" Appendix I for a full description of the ana- 
lytical model.) The results of these initial analyses included 
reliability and item scaling parameter estimates. Additionally, 
the analyses indicated the extent to which particular items within 
a set fit the scaling model and might be considered to be measures 
of the continuum underlying the set. 

In the case of each item set', those items which fit the model 
sufficiently well for the disadvantaged children were identified. 
Then those items which fit the model for the. advantaged children 
were identified. These two sets of items were then compared to 
determine which items fit the model in both the case of the advan- 
taged and the disadvantaged. These "commonly-fitting" items were 
then re-submitted to the scaling analysis procedures which generated 
new reliability and item scaling parameter estimators. 

Two criteria were established to determine whether or not a 
particular item set at this point would be retained as indicative 
of commonality of sequential development for advantaged and dis- 
advantaged children. The criteria were as follows: 

1. The lower limit of the 9 5 percent confidence interval ox 
the Kuder-Richardson 20 reliability estimate must be at least .70 

2. The lower limit of the 95 percent confidence interval of 
the correlation between the easiness parameter estimates for the 
items obtained from disadvantaged and advantaged subpopulations 





must be at least .80. 
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The next step in the analysis procedures was the consideration 
of the development of interval score conversion tables. For item 
sets that had been retained as indicative of universality across 
subpopula ticns , the interval score conversions from raw scores were 
reduced to positive integer values. This was done so that the in- 
terval scores might be conveniently used for the locating of in- 
dividuals with respect to the continuum which the item set was 
presumed to reflect. 

As the investigators were also concerned with the measurement 
of problem-solving development in disadvantaged children, the item 
sets which had failed to scale in the same way for both groups were 
analyzed separately for the disadvantaged children. That is to say, 
the items which were judged to fit the model after the first analysis 
for disadvantaged children only were re-analyzed in order to produce 
interval scale conversion parameters to provide more efficient measure- 
ment of disadvantaged children with respect to the continua which the 
various item sets were presumed to measure. The criterion used at 
this point for retaining a particular set of items was the Kuder- 

Richardson 20 reliability estimate. 

Additionally, comparisons were made of the relative performance 
of advantaged and disadvantaged children at three points in the 
analysis procedures outlined above. First a comparison of raw score 
means was performed for each item set as it appeared intact at the 
beginning of the analysis. A second comparison was performed on the 
raw score means based on only those items that fit the modal for both 
groups after the first analysis. Finally, a comparison of the means 
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of the interval scores was performed following the analysis of the 
items based on a combination of advantaged and disadvantaged chil- 
dren as one analysis group. 

Mode of Presentation 

A substantial number of item sets were generated through use 
of the rationale and procedures described on the preceding pages. 

All together, seventy-one sets of items wert analyzed. Nine of 
these resulted in the generation of scales which were common to 
both the advantaged and disadvantaged children. Fifteen scaled 
only for the disadvantaged group with acceptable reliability estimates 
(KRoq greater than .70). Thirty-two scales were 1 identified for the 
disadvantaged group but reliability estimates became acceptable 
only when projected on the basis of fifty items. Another seven 
scales still had less than acceptable reliability estimates even 
when projected to a group of fifty items. Finally, there were 
eight scales which had too few items for further analysis after the 
loss cx most of the items because of failure to fit the model. The 
nine common scales and the fifteen scales for the disadvantaged only 
will be included in the present document. 

To enhance the clarity of the presentation, those item sets 
which scaled commonly for both the disadvantaged ar/d the advantaged 
with sufficient reliability are presented first. Those that scaled 
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for only the disadvantaged children follow in a separate grouping. 
Within these groupings, the present investigators have used the 
same sequence for organizing the information related to each set. 

The sets are arbitrarily identified by the order of their pre- 
sentation, e-g.. First Item Set, Second Item Set, etc. Information 
concerning each set begins with the notation of the test from which 
the items were taken and a brief description of the item set. These 
descriptions may seem somewhat arbitrary to the reader but they have 
been included to allow for a general understanding of the item sets 
without continued reference to the appendices. This description is 
followed by an enumeration of the findings and a statement of the 
conclusions. The statistical data produced by the analyses related 
to each item set and verbal descriptions of the items are included 
in the same order in Appendix J - With respect to the verbal de- 
scriptions presented in Appendix J, the reader can identify the test 
and the particular item from the test by noting the I.D. Label and 
referring to Appendix K. In the latter appendix, all 1,875 items 
used in the study are listed by "I,D. Label," Anchor Group or Battery, 
and item number in the tests. The tables necessary to convert the 
raw scores for the twenty-four item sets to interval scores are 
included in Appendix L. 
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Group i: Item Sets Common to Both Groups 

First Item Set - Description: Caldwell Preschool Invent ory. --The 

Caldwell Preschool Inventory consists of 85 items separated into 

three groups: Personal-social Responsiveness, Associative Vocab- 

i 

ul ary, and Concept Activation. 

The Personal-social Responsiveness dimension involves knowledge 
about the child’s own personal world, i.e., name, address, parts of 
body, friends, as well as the carrying out of simple and complex 
verbal instructions given by an adult- The associative Vocabulary 
dimension requires the ability to demonstrate knowledge of the con- 
notation of a word by carrying out some action related to it. This 
includes simple labeling of geometric figures, supplying verbal or 
gestural labels for certain functions, actions, events, and time 
sequences, and being able to describe verbally the essential charac- 
teristics of certain social roles. The Concept Activation dimension 
appears to represent two major categories: ordinal or numerical re- 

lations, and sensory attributes such as form, color size, shape, and 
motion. It involves either being able to call on established con- 
cepts to describe or compare attributes (relating shapes to objects, 
color-names to objects or events) or to execute motorically some 
kind of spatial concept (reproduction of geometric designs or drawing 

the human figure) . 

First Item Set - Findings . --The scaling analysis of the 85 Caldwell 
Preschool Inventory items showed a reliability for the disadvantaged 
sample of .952 with 95 percent confidence limits of .963 and .940. 

The reliability of these items for the advantaged sample was .934 
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wirh 93 percent confidence limits of .946 and .920. The number of 
items meeting the model fit criterion was 67 for disadvantaged and 
62 for advantaged children. Of these items 49 were judged to fit the 
model for both groups. 

A comparison of the raw score means of the two groups based on 

all items showed high statistical significance (z — 9. 82) in favor 

\ 

of the advantaged group. 

The 49 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group 
sample of .937 with 93 percent confidence limits of .931 and .921. 
The reliability of these items for the advantaged group sample was 
.913, with 93 percent confidence limits of .930 and .894. Adjusted 
to a base of 30 items, these reliabilities were, respectively, .938 
with 93 percent confidence intervals of .932 and .923 and .913 with 
93 percent confidence intervals of .931 and .896. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z 9 . i.8) in favor 
of the advantaged group. 

Since the lower limit of the 93 percent confidence interval of 
the reliability coefficient for each group was greater than .70, 
namely .923 and .894 respectively, and since the lower limit of the 
93 percent confidence interval of the correlation between the easi- 
ness parameter estimates obtai -d from the two groups was greater 
than .80, in this case .814, Lhe 49 common items were analyzed by 
combining the two groups into one. The reliability resulting for 
these items was .937 with 93 percent confidence limits of .946 and 
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.927. The item difficulty indices showed a range from 26 percent 
to 98 percent with a median value of approximately 78%. Adjusted 
to a 50 item base the reliability was .938 with 95 percent confi- 
dence limits of . 947 and .928. 

The raw scores were converted to interval scores according to 
the estimates obtained from the analysis of the two groups combined. 

A comparison of the difference between the interval score means 
showed that the advantaged group substantially out-performed the 
disadvantaged group (z = 8 . 29) . 

First Item Set - Conclusions . --The correlation between the 49 pairs 
of item easiness parameter estimates derived from advantaged and dis- 
advantaged children was sufficiently high to support the contention 
that the two populations develop in the same order the competencies 
measured by the Caldwell items. The reliability estimates derived 
from the two groups were sufficiently high; hence, the items were 
analyzed and interval score conversions were produced on the basis 
of a single combined population. The resulting scale of the 49 items 
has a reliability coefficient with a lower 95 percent confidence bound 
of .927 and a reasonably good range and distribution of item difficulties. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is 
true whether the comparison is based upon the means of the original 
85 items, upon the means of the 49 items that fit the scaling model 
for both groups, or upon the means of the interval scores derived 
from the combined analysis. 
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ceotual Speed . --The Perceptual Speed subtest of the Primary Menta 
Abilities test contains 28 items. Each item consists of a picture 
of an object or symbol followed by four pictures of similar objects 
or symbols. The task is to select one of the four pictures which is 
exactly like the stimulus picture. While the original subtest was 
intended to be timed (hence the subtest title) , it was not timed 
when administered for our purposes. Thus, this subtest could be 
said to offer a measure of the ability to recognize likenesses and 
differences between objects or symbols accurately, but without re- 
gard to quickness. An obvious necessity for success in this task 
is good visual discrimination. 

Second Item Set - Findings - --The scaling analysis of the 28 Primary 
Mental Abilities Perceptual Speed items showed a reliability for the 
disadvantaged sample of .885 with 95 percent confidence limits of 
.887 and .819. The reliability of these items for the advantaged 
sample was .834- with 95 percent confidence limits of .868 and .796. 
The number of items meeting the model fit criterion was 25 for dis- 
advantaged and 25 for advantaged children. Of these items 23 were 

judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 7.88) in favor 
of the advantaged group. 

The 23 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the dls advantaged group 
sample of .835 with 95 percent confidence limits of .873 and .792. 

34 




-34- 



The reliability of these items for the advantaged group sample was 
.799, with 95 percent confidence limits of .843 and .749. Adjusted 
to a base of 50 items, these reliabilities were, respectively, .917 
with 95 percent confidence intervals of .936 and .896 with 95 percent 
confidence intervals of .919 and .871. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance ( z = 7.86) in favor 
of the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for each group was greater than .70, 
namely .792 and .749, respectively, and since the lower limit of the 
95 percent confidence interval of the correlation between the easiness 
parameter estimates obtained from the two groups was greater than .80, 
in this case .844, the 23 common items were analyzed by combining the 
two groups into one. The reliability resulting for these items was 
.838 with 95 percent confidence limits of .864 and .810. The item 
difficulty indices showed a range from 48 percent ot 91 percent with 
a median value of approximately 75 percent. Adjusted to a 50 item 
base the reliability was .918 with 95 percent confidence limits of 
.931 and .904. 

The raw scores were converted to interval scores according to 
the estimates obtained from the analysis of the two groups combined. 

A comparison of the difference between the interval score means showed 
that the advantaged group substantially out-performed the disadvantaged 
group (z = 5.93). 
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Second Item Set - Conclusions . --The correlation between the 23 parrs 
of items easiness parameter estimates derived from advantaged and 
disadvantaged children was sufficiently high to support the con- 
tention that the two populations develop perceptual competencies 
in the same order. The reliability estimates derived from the two 
groups were sufficiently high; hence, the items were analyzed and 
interval score conversions were produced on the basis of a single 
combined population. The resulting scale of the 23 items has a re- 
liability coefficient with a lower 95 percent confidence bound of 
.810 but the item difficulties are limited to the easy half of the 
range . 

The data indicate that the advantaged children outperform 
those of the disadvantaged group to a very great extent. 1. _*.s fact 
is true whether the comparison is based upon the means of the original 
28 items, upon the means of the 23 items that fit the scaling model 
for both groups, or upon the means of the interval scores derived 
from the combined analysis . 

Third Item Set - Description: Primary Mental Abilities, Number 

Facility . --The Number Facility subtest of the Primary Mental Abilities 
test contains 27 items, all of which are presented to the subject 
verbally. Each item consists of a picture on which are a number 
of similar objects. At the lower level the child is simply re- 
quired to count, e.g., (1) Point to THREE scissors and (2) Point 

to SIX sprinkling cans. At the intermediate level he is required 
to handle non-numerical quantities and serial position, e.g., (11) 

Point to MOST of the forks and (12) Point to the NEXT TO THE LAST 
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At the upper level he is required to do simple arith- 
^.e^^onipg, e.g., (26) Betty was playing with her doll buggy, 

little girls came with their doll buggies. How many 
doA V%^ Gs were there then? Point to them. (27) If I blow out 
th^ Se candles, how many will still be lit? Point to them. 

^ s ummary, this subtest appears to tap the ability to use 
nu**W*> ^of* ce pt s 5 to solve simple quantitative problems, and to 
and recognize quantitative differences. 

Se t - Findings .— The scaling analysis of the 27 Primary 
Abilities, Number Facility items showed a reliability for 
th^ dl^d v antaged sample of .917 with 95 percent confidence limits 
o£ £*nd .895. The reliability of these items for the advantaged 

-9 37 with 9b P ercent confidence limits of .94-9 and .924-. 
of items meeting the model fit criterion was 16 for disad- 
and 19 for advantaged children. Of these items, 13 were 

jil^^d 10 fit the model for both groups. 

$ comparison of the raw score means of the two groups based on 
aH showed high statistical significance (z = 9.59) in favor of 

^d v aPl a g ed group. 

rj^he 13 commonly fitting items were analyzed separately for the 
t^ ^ Q Ups, end showed a reliability for the disadvantaged group sample 
0 £ ,^Q with 95 percent confidence limits of .918 and .858. The reli- 
of these items for the advantaged group sample was .874-, with 
g $ *?^ e nt confidence limits of .903 and .841. Adjusted to a base of 
5 O £^, 3 , these reliabilities were, respectively, .969 with 95 percent 
intervals of .977 and .960; and .964 with 95 percent confi- 
Intervals of .972 and .955. 
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A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 8.45) in favor of 
the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for each group was greater than .70, 
namely .858 and .841, respectively, and since the lower limit of the 
95 percent confidence interval of the correlation between the easiness 
parameter estimates obtained from the two groups was greater than .80, 
case .895, the 13 co.nmon items were analyzed by combining the 
two groups into one. The reliability resulting for these items was 
.890 with 95 percent confidence limits of .909 and .869. The item 
difficulty indices showed a range from 19 percent to 90 percent with a 
median value of approximately 65 percent. Adjusted to a 50 item base 
the reliability was .969 with 95 percent confidence limits of .974 and 
.963. 

The raw scores were converted to interval scores according to the 
estimates obtained from the analysis of the two groups combined. A 
comparison of the difference between the interval score means showed 
that the advantaged group substantially out-performed the disadvantaged 
group (z = 4.34). 

Third Item Set - Conclusions . --The correlation between the 13 pairs of 
items easiness parameter estimates derived from advantaged and disad- 
vantaged children was sufficiently high to support the contention that 
the two populations develop number facility competencies in the same 
order. The reliability estimates derived from the two groups were 
sufficiently high; hence, the items were analyzed and interval score 
conversions were produced on the basis of a single combined population. 
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The resulting scale of the 13 items has a lower 95 percent confidence 
bound of .869 and a good range and distribution of item difficulties. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the me&ns of the original 27 
items, upon the means of the 13 items that fit the scaling model for 
both groups, or upon the means of the interval scores derived from 

the combined analysis. 

Fourth Item Set - Description; Columbia Me ntal Maturity Scale.— The 
Columbia Mental Maturity Scale contains 100 items arranged in order of 
difficulty. The first 57 of these items were used in the present 
study. Each item is printed on a separate card and consists of a 
series of from three to five drawings. The task is to select from 
the series of drawings on each card the one which is different from, 
or unrelated to, the others in the series. Bases for discrimination 
involve differences in color, shape, size, function, number, kind, 
missing parts, and symbolic material. Since the test requires no 
verbal response and only a minimal motor response it should be quite 
useful for physically handicapped children. Adequate visual discrimi- 
nation would seem to be prerequisite to success on this test. 

Fourth Item Set - Findings . —The scaling analysis of the 57 Columbia 
Mental Maturity Scale items showed a reliability for the disadvantaged 
sample of .954 with 95 percent confidence limits of .964 and .943. 

The reliability of these items for the advantaged sample was .899 with 




95 percent confidence limits of .919 and .877. The number of items 
meeting the model fit criterion was 47 for disadvantaged and 47 for ad- 
vantaged children. Of these items, 41 were judged to fit the model for 

both groups. 39 
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A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 8. 96) in favor of 
the advantaged group . 

The 41 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group sample 
of .944 with 95 percent confidence limits of .957 and .929. The reli- 
ability of these items for the advantaged group sample was .890, with 
95 percent confidence limits of .919 and .856. Adjusted to a base of 
50 items, these reliabilities were, respectively, .954 with 95 percent 
confidence intervals of .965 and .941; and .908 with 95 percent confi- 
dence intervals of .933 and .880. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 7.78) in favor of 
the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for each group was greater than .70, namely 

I 

.929 and .856, respectively, and since the lower limit of the 95 per- 
cent confidence interval of the correlation between the easiness para- 
meter estimates obtained from the two groups was greater than .80, in 
this case .824, the 41 common items were analyzed by combining the two 
groups into one. The reliability resulting for these items was .942 
with 95 percent confidence limits of .953 and .930. The item diffi- 
culty indices showed a range from 55 percent to 94 percent with a 
median value of approximately 89 percent. Adjusted to a 50 item base 
the reliability was .952 with 95 percent confidence limits of .961 
and .942. 
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The raw scores were converted to interval scores according to 
the estimates obtained from the analysis of the two groups combined. 

A comparison of the difference between the interval score means showed 
that the advantaged group substantially out-performed the disadvantaged 



group (z = 6.55). 

Fourth Item Set - Conclusions . --The correlation between the 41 pairs 
of items easiness parameter estimates derived from advantaged and dis- 
advantaged children was sufficiently high to support the contention 
that the two populations develop in the same order the competencies 
measured by these items. The reliability estimates derived from the 
two groups were sufficiently high; hence, the items were analyzed and 
interval score conversions were produced on the basis of a single 
combined population. The resulting scale of the 41 items has a lower 
95 percent confidence bound of .930, but the item difficulties are 

limited to the es jy half of thtr twinge * 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 57 
items, upon the means of the 41 items that fit the scaling model for 
both groups, or upon the means of the interval scores derived from the 

combined analysis. 

Fift h Item Set - Description; Dr aw- A- Pe r son Test. --The Draw-A- Person 
Test is perhaps the most unusual of the many tests of general ability 
in terms of basic conception, brevity, and convenience. The child 

is simply given a pencil and paper and told to " make a picture of 

a person. Make the very best picture you can; take your time and work 

1 ^|‘( yery carefully." 
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Scoring is primarily concerned with the ideas portrayed in the 
drawings rather than with the technical skill of the drawings. There 
is no interest in evaluating artistic skill, as such. Inclusion and 
accuracy of detail, and proportion are the important factors. 

The Draw-A-Person Test might be said to tap cognitive and psycho- 
motor skills particularly, the ability to form concepts of increas- 
ingly abstract character. Subsumed under these skills would be: 

(1) the ability to perceive , i.e., to discriminate likenesses 
and differences, 

(2) the ability to abstract , i.e., to classify objects according 
to such likenesses and differences, and 

(3) the ability to generalize , i.e., to assign a new object to 
a correct class, according to discriminated features, 
properties, or attributes. 

The Draw-A-Person Test appears to be appropriate for children 
from ages 4 to 14. After about age 14 Draw-A-Person Test scores 
cease to show increments. 

Fifth Item Set - Findings . — The scaling analysis of the 73 items showed 
a reliability for the disadvantaged sample of .887 with 95 percent con- 
fidence limits of .912 and .860. The reliability of these items for 
the advantaged sample was .900 with 95 percent confidence limits of 
.920 and .878. The number of items meeting the model fit criterion 
was 44 for disadvantaged and 57 for advantaged children. Of these 
items 37 were judged to fit the model for both groups. 
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A comparison of the raw score means of the two groups based on 

* 

all items showed high statistical significance (z = 5.85) in favor of 

the advantaged group. 4 

The 37 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.830 with 95 percent confidence limits of .867 and .788. The reliability 
of these items for the advantaged group sample was .858, with 95 percent 
confidence limits of .886 and .327. Adjusted to a base of 50 items, 
these reliabilities were, respectively, .868 with 5- percent confidence 
intervals of .897 and .836; and .891 with 95 percent confidence intervals 

of .913 and .867. 

A comparison of the raw score means of the two groups based on the 
items showed high statistical significance (z = 5.31) in favor of the 
advantaged group. 

Since the lower limit of the 95 percent confidence interval of the 
reliability coefficient for each group was greater than .70, namely 
.788 and .827, respectively, and since the lower limit of the 95 percent 
confidence interval of the correlation between the easiness parameter 
estimates obtained from the two groups was greater than .80, in this 
case .927, the 37 common items were analyzed by combining the two groups 
into one. The reliability resulting for these items was .852 with 95 
percent confidence limits of .879- and .828. The item difficulty indices 
showed a range from 1 percent to 96 percent with a median value of 
approximately 9 percent. Adjusted to a 50 item base the reliability was 
.886 with 95 percent confidence limits of .903 and . 868 . 

The raw scores were converted to interval scores according to the 
estimates obtained from the analysis of the two groups combined. A 



comparison of the difference between the interval score means showed 
that the advantaged group substantially out-performed the disadvantaged 
group (z = M- . 1M-) . 

Fifth Item Set - Conclusions . — The correlation between the 37 pairs of 
items easiness parameter estimates derived from disadvantaged and ad- 
vantaged children was sufficiently high to support the contention that 
the two populations develop In the same order the competencies measured 
by these items. The reliability estimates derived from the two groups 
were sufficiently high; hence, the items were analyzed and interval score 
conversions were produced on the basis of a single combined population. 

The resulting scale of the 37 items has a reliability coefficient with a 
lower 95 percent confidence bound of .828 and a good range of item 
difficulties. These indices, however, tend to the very difficult part 
of the range . 

The data indicated that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 73 items, 
upon the means of the 37 items that fit the scaling model for both groups, 
or upon the means of the interval scores derived from the combined 
analysis . 

Sixth Item Set - Description: Marianne Frostig Developmental Test of 

Visual Perception . — The Marianne Frostig Developmental Test of Visual 
Perception employs five different types of items. The eye-motor co- 
ordination items require the subject to draw lines either within speci- 
fied boundaries or between specified points. Some of che lines are to 
be straight, some curved, some angled. The figure-ground items require 
the subject to outline certain figures, e.g., stars, crosses, ovals. 
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etc., that are printed within increasingly complex grounds. The con- 
stancy of shape items require the subject to identify certain figures, 
e,g., circles, squares, parallelograms, etc., that are presented in 
various positions, sizes, shadings, etc. The position in space items 
require the subject to identify the drawings of common objects that have 
been rotated or reversed in the context of a series of such objects. 

The spatial relatr.onships items require the subject to copy forms and 
patterns using dots as orienting ground. All together there are 72 
items that measure visual perceptual, motor coordination ability. 

Sixth Item Set - Findings . — The scaling analysis of the 72 items showed 
a reliability for the _ _sadvantaged sample of .904 with 95 percent con- 
fidence limits of .933 and .870. The reliability of these items for 
the advantaged sample was .916 with 95 percent confidence limits of 
.931 and .899. The number of items meeting the model fit criterion 
was 38 for disadvantaged and 39 for advantaged children. Of these 
items, 21 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on all 
items showed high statistical significance (z = 7.49) in favor of the 
advantaged group. 

The 21 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.798 with 95 percent confidence limits of .843 and .747. The reliability 
of these items for the advantaged group sample was .787, with 95 percent 
confidence limits of .827 and .742. Adjusted to a base of 50 items, 
these reliabilities were, respectively, .904 with 95 percent confidence 
intervals of .925 and .880; and .898 with 95 percent confidence intervals 
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of .917 and .877. 
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A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 6. 57) in favor of 
the advantaged group . 

Since the lower limit of the 95 percent confidence interval of the 
reliability coefficient for each group was greater than .70, namely 
.747 and .742, respectively, and since the lower limit of the 95 per- 
cent confidence interval of the correlation between the easiness para- 
meter estimates obtained from the two groups was greater than .80, in 
this case .866, the 21 common items were analyzed by combining the two 
groups into one. The reliability resulting for these items was .810 
with 95 percent confidence limits of .838 and .780. The item diffi- 
culty indices showed a range from 13 percent to 98 percent with a median 
value of approximately 74 percent. Adjusted to a 50 item base the re- 
liability was .910 with 95 percent confidence limits of .923 and .896. 

The raw scores were converted to interval scores according to the 
estimates obtained from the analysis of the two groups combined. A com- 
parison of the difference between the interval score means showed that 
the advantaged group substantially out-performed the disadvantaged 
group (z = 6 . 21) . 

Sixth Item Set - Conclusions „ --The correlation between the 21 pairs of 
items easiness parameter estimates derived from disadvantaged and ad- 
vantaged children was sufficiently high to support the contention that 
the two populations develop in the same order the competencies measured 
by rhese items. The reliability estimates derived from the two groups 
were sufficiently high; hence, the items were analyzed and interval 
score conversions were produced on the basis of a single combined popula- 
tion. The resulting scale of the 21 items has a lower 95 percent 



ERIC 




-46- 



confidence bound of .780 and a good range of item difficulties. These 
indices, however, tend to distribute to the easy end of the scale. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent- This fact is true 
whether the comparison is based upon the means of the original 72 
items, upon the means of the 21 items that fit the scaling model for 
both groups, or upon the means of the interval scores derived from the 

combined analysis. 

Seventh Item Set - Description: Stanf ord-Binet f Ivfg l\iS e , nce ---. Sc ^ j ^ ■S■ , 

Form. L-M . --The Stanf ord-Binet Intelligence Scale, Form L-M, consists 
of items that represent a heterogeneous set of tasks. For the purposes 
of the present study items tanging from year II to year VII, inclusively 
served as the basis for testing. The tasks these items represent are 
verbal, non-verbal and manipulative. Examples of verbal item types are 
vocabulary, similarity and differences, comprehension, etc. Non-veroal 
items Include delayed memory for objects and pictures, identification 
of objects by use, visual discrimination of similar pictures, etc. 
Manipulative items include button sorting, paper folding, maze tracing 
and the like. Cultural bias is probably a factor affecting the scores 
on these items because of the verbal emphasis arid type of content that 
the items represent. 

The particular way in which the administration of the tests in this 
study was accomplished resulted in a total number of items that exceeds 
the number indicated in the standard version of The Binet. For example 
items that normally require fewer correct responses for credit than the 
number of stimuli were administered in their entirety in each case and 
O 
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were scored as if each stimulus was a separate item. Hence, the total 
number of items associated with this test in this study is 216. 

Because the capacity of the scaling program did not permit the 
analysis of more than ninety-nine items at a time, a division of the 
Binet items into subgroups was necessary. The item set currently 
under consideration consists of items derived from Binet items IV-2 
through VII-A and also includes the first vocabulary items. 

Seventh Item Set - Findings . — The scaling analysis of the 99 iter, 
snowed a reliability for the disadvantaged sample of ,963 with 95 per- 
cent confidence limits of .967 and .959. The reliability of these 
items for the advantaged sample was .947 with 95 percent confidence 
limits of .952 and .942. The number of items meeting the model fit 
criterion was 69 for disadvantaged and 62 for advantaged children. Of 
these items, 48 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on all 
items showed high statistical significance (z = 25.13) in favor of the 
advantaged group. 

The 48 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.939 with 95 percent confidence limits of .946 and .932. The reli- 
ability of these items for the advantaged group sample was .870, with 
95 percent confidence limits of .883 and .856. Adjusted to a base of 
50 items, these reliabilities were, respectively, .941 with 95 percent 
confidence intervals of .948 and .935: and .875 with 95 percent confi- 
dence intervals of .887 and .861. 
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A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 22.90) in favor 
of the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for each group was greater than .70, namely 
.932 and .856, respectively, and since the lower limit of the 95 percent 
confidence interval of the correlation between the easiness parameter 
estimates obtained from the two groups was greater than .80, in this 
case .946, the 48 common items were analyzed by combining the two 
groups into one. The reliability resulting for these items was .942 
with 95 percent confidence limits of .946 and .938. The item diffi- 
culty indices showed a range from 10 percent to 97 percent with a median 
value of approximately 88 percent. Adjusted to a 50 item base the reli- 
ability was .944 with 95 percent confidence limits of .948 and .940. 

The raw scores were converted to interval scores according to the 
estimates obtained from the analysis of the two groups combined. A 
comparison of the difference between the interval score means showed 
that the advantaged group substantially out-performed the disadvantaged 
group (z = 24.54). 

Seventh Item Set - Conclusion s . --The correlation between the 48 pairs 
of items easiness parameter estimates derived from disadvantaged and 
advantaged children was sufficiently high to support the contention 
that the two populations develop competencies represented by these items 
in the same order. The reliability estimates derived from the two 
groups were also sufficiently high. Hence, the items were analyzed 
and interval score conversions were produced on the basis of a single 
combined population. The resulting scale of the 48 items has a lower 
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95 percent confidence bound of .938 and a good range but a poor dis- 
tribution of item difficulties; the items tend to be quite easy. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 99 
items, upon the means of the 48 items that fit the scaling model for 
both groups, or upon the means of the interval scores derived from 
the combined analysis. 

Eighth Item Set - Description: WPPSI Picture Completion . --The WPPSI 

Picture Completion test consists of 23 pictures, each of which has 
some important part missing. The cards are presented to the child in 
numerical order, and he is asked to name or indicate the missing part 
on each card. Basic perceptual and conceptual abilities arc involved 
in as . ::ich as these are needed in the visual recognition and identifi- 
cation of the objects presented. In a broader sense, the test might 
be said to measure the ability to differentiate essential from non- 
essential details in a visual stimulus. In order to see what is missing 

i 

from any particular picture, the subject must first know what that 
picture represents. For this reason, subjects from limited experi- 
ential backgrounds might do poorly on this test. 

Eighth Item Set - Findings , — The scaling analysis of the 23 items 
showed a reliability for the disadvantaged sample of .858 with 95 per- 
cent confidence limits of .873 and .842, The reliability of these 

i 

» 

I items for the advantaged sample was .836 with 95 percent confidence 
limits of .853 and .818. The number of items meeting the model fit 
criterion was 16 f r disadvantaged and 18 for advantaged children. Of 
^^0^. these items 12 were judged to fit the model for both groups. 




-50- 



A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 25.87) in favor 
of the advantaged group. 

The 12 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group ample 
of .769 with 95 percent confidence limits of .795 and .742. The reli- 
ability of these items for the advantaged group sample was .730, with 
95 percent confidence limits of .758 and .700. Adjusted to a base of 
50 items, these reliabilities were, respectively, .933 with 95 percent 
confidence intervals of .940 and .925; and .919 with 95 percent confi- 

dence intervals of .927 and .910,. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 23.63) in favor of 
the advantaged group. 
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Since the lower limit of the 95 percent confidence interval of’ 
the reliability coefficient for each group was greater than .70, namely 
.742 and .700, respectively, and since the lower limit of the 95 per- 
cent confidence interval of the correlation between the -easiness para- 
meter estimates obtained from the two groups was greater than .80, in 
this case .926, the 12 common items were analyzed by combining the two 
groups into one. The reliability resulting for these items was .809 
with 95 percent confidence limits of .824 and .794. The iLern diffi- 
culty indices showed a range from 5 percent to 9 8 percent with a median 
value of approximately 61 percent. Adjusted to a 50 item base the 
reliability was .946 with 95 percent confidence limits of .950 and .942. 

The raw scores were converted to interval scores according to the 
estimates obtained from the analysis of the two groups combined. A 
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comparison of the difference between the interval score means showed 
that the advantaged group substantially out-performed the disadvantaged 
group . 

Eighth Item Set - Conclusions .--The correlation between the 12 pairs of 
items easiness parameter estimates derived from advantaged and disad- 
vantaged children was sufficiently high to support the contention that 
the two populations develop in the same order the < apetencies measured 
by these items. The reliability estimates derived num the two groups 
were sufficiently high; hence, the items were analyzed and interval 
score conversions were produced on the basis of a single combined popu- 
lation. The resulting scale of the 12 items has a lower 95 percent 
confidence bound of .794 and a good range and distribution of item 
difficulties . 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 23 items, 
upon the means of the 12 items that fit the scaling model for both 
groups, or upon the means of the l.nterval scores derived from the com- 
bined analysis. 

Ninth Item Set - Description; Minnesota Preschool Scale . --The Minnesota 
Preschool Scale contains items that are quite heterogeneous in item 
type. There are verba"* , non-verbal and manipulative items. Examples 
of verbal items include comprehensicr. , absurdities, vocabulary, oppo- 
sites, sample sentences, etc. Non- verbal items incliide discrimination 

and recognition of forms, identification of missing parts in pictures, 

% 

etc. Manipulative items include imitative drawing, copying geometric 
designs, block building, picture puzzles, paper folding, etc. 
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Because of the particular way in which test items were administered 
and scored in this study, the 26 items of the standard Minnesota Scale 

were scored as 39 separate items. 

Cultural bias is probably a factor affecting; the scores on these 
items because of the verbal emphasis and type of content that the j.tems 
represent . 

Ninth Item Set - Findings . — The scaling analysis of the 89 items showed 
a reliability for the disadvantaged sample of .922 with 95 percent con- 
fidence limits of .938 and .904. The reliability of these items for 
the advantaged sample was .903 with 95 percent confidence limits of .922 
and .882. The number of items meeting the model fit criterion was 58 
for disadvantaged and 45 for advantaged children. Of these items 30 
were judged to fit the model for both groups. 

A comparison of the raw score means of the twe groups based on all 
items showed high statistical significance (z = 13.89) in favor of the . 



advantaged group. 

The 30 commonly fitting items were analyzed separately for the tv.w 
groups, and showed a reliability for the disadvantaged group sample of 
.867 with 95 percent confidence limits of .894 and .836. The reli- 
ability of these items for the advantaged group sample was .827, with 
95 percent confidence limits of .862 and .788. Adjusted to a base of 
5Q items, these reliabilities were, respectively, .916 with 95 percent 
confidence intervals of .933 and .897; and .889 with 95 percent confi- 
denee intervals of .911 and .864. 

* A comparison of the raw scors means of the two groups based on 

the items showed high statistical signi i^cance (z = 12.55) in favor of 



ERIC the advantaged group. 
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Since the lower limit of the 9 5 percent confidence interval of 
the reliability coefficient for each group was greater than .70, 

> namely .836 and .788, respectively, and since the lower limit of the 
95 percent confidence interval of the correlation between the easiness 
parameter estimates obtained from the two groups was greater than .80, 
in this case .875, the 30 common items were analyzed by combining the 
two groups into one. The reliability resulting for these items u’as 
.894 with 95 percent confidence limits of .909 and .877. The item 
difficulty indices showed a range from 11 percent to 99 percent with 
a median value of approximately 77 percent. Adjusted to a 50 item base 
the reliability was .934 with 95 percent confidence limits of .943 and 
.923. 

The raw scores were converted to interval scores according to the 
estimates obtained from the analysis of the two groups combined. A 
comparison of the difference between the interval score means showed 
that the advantaged group substantially out-performed the disadvantaged 
group (z = 12.53). 

Ninth Item Set - Conclusions . — The correlation between the 30 pairs of 
items easiness parameter estimates derived from disadvantaged and ad- 
vantaged children was sufficiently high to support the contention that 
the two populations develop in the same order the competencies measured 
by these items. The reliability estimates derived from the two groups 
were sufficiently high; hence, the items were analyzed and interval 
score conversions were produced on the basis of a single combined popu- 
lation. The resulting scale of the 30 items has a reliability coeffi- 
cient with a lower 95 percent confidence bound of .877 and a good 
© . range of item difficulties. These indices, however, tend to the easy 
end of the range. 
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The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 89 items 
upon the means of the 30 items that fit the scaling model for both 
groups, or upon the means of the interval scores derived from the com- 
bined analysis. 
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Group 2: Item Sets Scaling for Disadvantaged Only 

Tenth Item Set - Description: Primary Mental Abilities, Verbal 

Meaning . — The Verbal Meaning subtest of the Primary Mental Abilities 
test consists of 42 items, with each item consisting of 4 pictures. 
At the lower level the items are simply picture vocabulary, e.g.. 



(1) Point to the crown and (2) Point to the dome. At the upper 
level the child must demonstrate the ability to understand ideas ex- 
pressed in words, e.g., (42) Early settlers could not get glass for 

the windows of their cabins. They dipped paper in oil and used this 
paper to cover the • Point to it. All items are read 

to the children so that children with reading handicaps should not be 




penalized. The pictures used for the items are rather small and de- 
tailed, which makes good visual discrimination prerequisite for success. 
Tenth Item Set - Findings . — The scaling analysis of the 42 Primary 
Mental Abilities Verbal Meaning items showed a reliability for the dis- 
advantaged sample of .820 with 95 percent confidence limits of .859 and 
.775. The reliability of these items for the advantaged sample was .869 
with 95 percent confidence limits of .894 and .842. The number of items 
meeting the model fit criterion was 28 for disadvantaged and 33 for ad- 
vantaged children. Of these items, 24 were judged to fit the model for 
both groups . 

A comparison of the raw score means of the two groups based on all 
items showed high statistical significance (z - 12.09) in favor of the 
advantaged group. 

The 24 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.768 with 95 percent confidence limits of .819 and .710. The reliability 
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of these items for the advantaged group sample was .785, with 95 per- 
cent confidence limits of .826 and .739. Adjusted to a base of 50 
items, these reliabilities were, respectively, .873 with 95 percent con- 
fidence intervals of .901 and .842; and .884 with 95 percent confidence 
intervals of .906 and .859. 

A comparison of the raw score means of the two groups based on the 
items showed high statistical significance (z = 10.13) in favor of the 
advantaged group. 

Since the lower limit of the 95 percent confidence interval of the 
easiness parameter correlation was less than .80, namely, .614, the 
items were not analyzed by combining the two groups into one. 

The 28 items which met the model fit criterion at the first scaling 
analysis for the disadvantaged sample were reanalyzed and showed a reli- 
ability of .775 with 95 percent confidence limits of .824 and .719. The 
item difficulty indices showed a range from 18 percent to 88 percent 
with a median value of approximately 60 percent. Adjusted to a 50 item 
base, the reliability was .860 with 95 percent confidence limits of 
.891 and .826. 

Tenth Item Set - Conclusions . — The correlation between the 24 pairs of 
item easiness parameter estimates derived from advantaged and disad- 
vantaged children was small enough to cast doubt on the contention that 
the two populations develop in the same order the competencies measured 
by these items. Hence, an analysis based on the combined groups was 
not made. 

The 28 items that met the model fit criterion for the disadvantaged 
group at the first analysis were reanalyzed for that group only and 
produced a reliability coefficient with a lower 95 percent confidence 
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limit of .719. Because this coefficient was greater than .70, inter- 
val scale conversions were made for the disadvantaged group. The 
range and distribution of the item difficulties were good. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 42 
items or upon the means of the 24 items that fit the scaling model for 
both groups. 

Eleventh Item Set - Description: Primary Mental Abilities, Spatial 
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Relations . — The Spatial Relations subtest of the Primary Mental Abili- 
ties test consists of 24 items. The first 12 items in this subtest re- 
quire the subject to select one of four geometric designs which, when 
added to the stimulus design, will complete a square. This seems to 
require the ability to see part-whole relationships in a visual stimu- 
lus. The remaining 12 items consist of geometric designs paired with 
similar, but incomplete, geometric designs. The child's task is to 
complete the incomplete design using the completed design as a model. 

Here again, the ability to see part-whole relationships in a visual 
stimulus is required. In addition, the child must possess sufficient 
eye-hand-motor coordination to utilize a pencil in completing the design. 
For both parts of this subtest adequate visual discrimination is pre- 
sumed . 

Eleventh Item Set - Findings . — The scaling analysis of the 24 Primary 
Mental Abilities Spatial Relations items showed a reliability for the 
disadvantaged sample of .860 with 95 percent confidence limits of .891 
and .824. The reliability of these items for the advantaged sample 
was .899 with 95 percent confidence limits of .918 and .878. The number 
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of items meeting the model fit criterion was 19 for disadvantaged and 
15 for advantaged children. Of these items, 12 were judged to fit 
the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 9.59) in favor of 
the advantaged group. 

The 12 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.713 with 95 percent confidence limits of .780 and .635. The reliability 
of these items for the advantaged, group sample was ,821, with 95 percent 
confidence limits of .857 and .781. Adjusted to a base of 50 items, 
these reliabilities were, respectively, .912 with 95 percent confidence 
intervals of .932 and .889; and .950 with 95 percent confidence inter- 
vals of .960 and .990. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 9.05) in favor of 
the advantaged group. 

Since the lower limit of the 95 percent confidence interval of the 
reliability coefficient for the disadvantaged group was less than .70, 
namely, .635, and since the lower limit of the 95 percent confidence 
interval of the easiness parameter correlation was less than .80, 
namely, .790, the items were not analyzed by combining the two groups 
into one. 

The 19 items which met' the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .829 with 95 percent confidence limits of .866 
and .782. The item difficulty indices showed a range from 1 percent 
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to 74 percent with a median value of approximately 38 percent. 

Adjusted to a 50 item base, the reliability was .926 with 95 percent 
confidence limits of .943 and .908. 

Eleventh Item Set - Conclusions . —The correlation between the 12 pairs 
of item easiness parameter estimates derived from advantaged and disad- 
vantaged children was small enough to cast doubt on the contention 
that the two populations develop in the same order the competencies 
measured by these items. Mso the reliability estimate for the common 
items for the disadvantaged group was too small to justify use of the 
items in a common analysis. 

The 19 items that met the model fit criterion for the disadvantaged 
group at the first analysis were reanalyzed for that group only and pro- 
duced a reliability coefficient with a lower 95 percent confidence limit 
of .782. Because this coefficient was greater than .70, interval scale 
conversions were made for the disadvantaged group. The it m diffi- 
culties tended to the difficult end of the range. 

The data indicate that the advantaged children outpei .mm those of 
the disadvantaged group to a very great extent. This facl is true 
whether the comparison is based upon the means of the original 24 items 
or upon the means of the 12 items that fit the scaling model for both 
groups . 

Twelfth Item Set - Description: ITPA y:\ nditorv-Vocal Association. —The 

purpose of the Auditory Vocal Association test of the ITPA is to, assess 
the child 1 s ability to relate verbal symbols on a meaningful basis, in 
this case by analogy. A sentence completion technique is employed in 
which the child is required to supply the analogous term. The test 
consists of 26 items, apparently intended to be in order of difficulty 
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from easiest to hardest. Examples of the items are as follows: 

1. I sit on a chair. I sleep on a _ . „ — • 

13. A boy runs. An old man — • 

26. An ocean is deep. A pond is — — »• 

In scoring, only verbal responses are credited. Gestures receive 
no credit. Neither articulatory nor grammatical perfection is re- 
quired. The task is simply to supply the analogous missing word. Each 
item is presented verbally to the child and his response is also verbal, 
thus the effects of reading difficulties should be minimized. 

Twelfth Item Set - Findings . --The scaling analysis of the 26 items showed 
a reliability for the disadvantaged sample of .818 with 95 percent confi- 
dence limits of .854 and .778. The reliability of these items for the 
advantaged sample was .804 with 95 percent confidence limits of .842 and 
.762. The number of items meeting the model fit criterion was 17 for 
disadvantaged and 22 for advantaged children. Of these items 14 were 
judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on all 
items showed high statistical significance (z = 13.58) in favor of the 
advantaged .group. 

The 14 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.760 with 95 percent confidence limits of .809 and .705. The reli- 
ability of these items for the advantaged group sample was .742 with 
95 percent confidence limits of .795 and .682. Adjusted to a base of 
50 items, these reliabilities were, respectively, .919 with 95 percent 
confidence intervals of .935 and .901; and .911 with 95 percent confi- 
dence intervals of .929 and .891. 
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A comparison of the raw score means of the two groups based on the 
items showed high statistical significance (z = 14.12) in favor of the 
advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for the advantaged group was less than .70, 
namely, .682, the items were not analyzed by combining the two groups 
into one . 

The 17 items which met the model fit criterion at the first scaling 
analysis for the disadvantaged sample were reanalyzed and showed a reli- 
ability of .786 with 95 percent confidence limits of .829 and .738. The 
item difficulty indices showed a range from 3 percent to 96 percent with 
a median value of approximately 51 percent. Adjusted to a 50 item base 
the reliability was .^15 with 95 percent confidence limits of .932 and 
.896. 

Twelfth Item Set - Conclusions . --The reliability estimate for the common 
items for the advantaged group was too small to justify use of the items 
in a common analysis. 

The 17 items that met the model fit criterion for the disadvantaged 
group at the first analysis were reanalyzed for that group only and pro- 
duced a reliability coefficient with a lower 95 percent confidence limit 
of .738. Because this coefficient was greater than .70, interval scale 
conversions were made for the disadvantaged group. The range and distri- 
bution of the item difficulties were good. 

The data indicate that the advantaged children outperform those of 
the disadvantaged group to a very great extent. This fact is true whether 
the comparison is based upon the means of the original 26 items or upon 
the means of the 14 items that fit the scaling model for both groups. 

G2 



-62- 



Thirteenth Item Set - Description: ITPA Au ditory D ecoding lest . --The 

Auditory Decoding test of the ITPA assesses the chiid T s ability to 
comprehend the spoken word. It is assessed by a controlled vocabulary 
test in which the child is asked to indicate or no, either by 

voice or gesture whether or not a word has been used correctly. The 

child does not have to define the word. 

Examples of these questions are as follows: 

1. Do you smoke? 

5. Do babies eat? 

14. Do children climb? 

24. Do penguins wobble? 

32. Do carbohydrates nourish? 

33. Do meteorites collide? 

There are 36 such items, apparently intended to be in order of 
difficulty from easiest to most difficult. Since it is only necessary 
for the child to nod yes or no to each item, the effects of reading 
and vision handicaps should be minimized. 

Thirteenth Item Set - Findings . —The scaling analysis of the 36 items 
showed a reliability for the disadvantaged sample of .876 with 9S percent 
confidence limits of .901 and .849. The reliability of these items for 
the advantaged sample was .859 with 95 percent confidence limits of 
.886 and .829. The number of items meeting the model fit criterion 
was 25 for disadvantaged and 24 for advantaged children. Of these 
items, 15 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on all 
items showed high statistical significance (z = 15.61) in favor of the 
k advantaged group. 
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The 15 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group sample 
of .802 with 95 percent confidence limits of .863 and .728. The reli- 
ability of these items for the advantaged group sample was .710, with 
95 percent confidence limits of .770 and .642. Adjusted to a base oj. 

50 items, these reliabilities were, respectively, .931 with 95 percent 
confidence intervals of .952 and .906; and .891 with 95 percent confi- 
dence intervals of .913 and .866. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 13.59) in favor of 
the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for the advantaged group was less than .70, 
namely, .642, and since the lower limit of the 95 percent confidence 
interval of the easiness parameter correlation was less than .80, 
namely, .167, the items were not analyzed by combining the two groups 
into one. 

The 25 items which met the model fit criterion at the first scaling 
analysis for the disadvantaged sample were reanalyzed and showed a re- 
liability of .851 with 95 percent confidence limits of .881 and .818. 

The item difficulty indices showed a range from 4 percent to 94 percent 
with a median value of approximately 16 percent. Adjusted to a 50 
item base the reliability was .920 with 95 percent confidence limits 
of .935 and .902. 

Thirteenth Item Set - Conclusions . — The correlation between the 15 
pairs of item easiness parameter estimates derived from advantaged and 
disadvantaged children was small enough to cast doubt on the contention 
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that the two populations develop in the same order the competencies 
measured by these items. Also the reliability estimate for the common 
items for the advantaged group was too small to justify use of the 
items in 3 common analysis - 

The 25 items that met the model fit criterion for the disadvan- 
taged group at the first analysis were reanalyzed for that group only 
and produced a reliability coefficient with a lower 95 percent confi- 
dence limit of .818. Because this coefficient was greater than ,70, 
interval scale conversions were made for the disadvantaged group. The 
range of the item difficulties was good, but the distribution tended to 

the difficult end of the range. 

The data indicate that the advantaged children outperform those of 
the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 36 items 
or upon the means of the 15 items that fit the scaling model for both 

groups . 

Fourteenth Item Set - Description: X TPA Visual-Motor Sequencing Test.- 

The Visual-Motor Sequencing test of the ITPA assesses the ability of 
the child to correctly reproduce a sequence of symbols previously seen. 
Short-term memory for visual stimuli is tested by requiring the child 
to duplicate the order of a sequence of pictures or geometrical designs 
presented to him and then removed. Each item utilizes a certain number 
and type of picture or form chips and a tray in which to arrange them 
in a given sequence. The examiner places a given set of chips in a 
certain sequence in the tray, allows the child to observe this sequence 
for five seconds, dumps the chips out and requires the child to dupli- 
cate the sequence. There are 15 such items arranged in order of in- 
O - - 
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Fourteenth Item Set - Findings . --The scaling analysis of the 15 items 
showed a reliability for the disadvantaged sample of .822 with 95 per- 
cent confidence limits of .856 and .782. The reliability of these 
items for the advantaged sample was .754 with 95 percent confidence 
limits of .803 and .700. The number of items meeting the model fit 
criterion was 13 for disadvantaged and 9 for advantaged children. Of 
these items, 8 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 9.67) in favor of 
the advantaged group. 

The 8 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.703 with 95 percent confidence limits of ,768 and .628. The reli- 
ability of these items for the advantaged group sample was .665, with 
95 percent confidence limits of .734 and .588. Adjusted to a base of 
50 items, these reliabilities were, respectively, .937 with 95 percent 
confidence intervals of .950 and .922; and .926 with 95 percent confi- 
dence intervals of .940 and .910. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 8.45) in favor of 
the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for both groups was less than .70, namely, 
.628 and .588, respectively, the items were not analyzed by combining 
the two groups into one. 

The 13 items which met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 



ERIC 




- 66 - 



showed a reliability of .819 with 95 percent confidence limits of 
.856 and . 111 . The item difficulty indices showed a range from 2 per- 
cent to 98 percent with a median value of approximately 21 percent. 
Adjusted to a 50 item base the reliability was .946 with 95 percent 
confidence limits of .957 and .934. 

Fourteenth Item Set - Conclusion s . --The reliability estimates for the 
common items for both the disadvantaged and the advantaged groups were 
too small to justify use of the items in a common analysis. 

The 13 items that met the model fit criterion for the disadvantaged 
group at the first analysis were reanalyzed for that group only and pro- 
duced a reliability coefficient with a lower 95 percent confidence limit 
of .777. Because this coefficient was greater than .70, interval scale 
conversions were made for the disadvantaged group. The range of the 
item difficulties was good, but rhe distribution tended to the diffi- 
cult end of the range. 

The data indicate that i iged children outperform those of 

the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 15 items 
or upon the means of the 8 items that fit the scaling model for both 

groups. 

Fifteenth Item Set - Description: ITPA Aud i tory-Vocal Sequencing . -- 

The Auditory -Vocal Sequencing test of the ITPA. assesses the ability of 
a child to correctly repeat a sequence of symbols previously heard. 

This is tested by a modified digit repetition test. There are 20 items 
in this test with the easiest item containing two digits and the most 
difficult item containing seven digits. The digits are read to the 
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child at the rate of two per second. The child must always repeat the 
digits in the same order that he heard them. 

This test might be more properly referred to as a test of short- 
term auditory memory for numbers. Adequate hearing ability is an 
obviously critical factor for success on this test. 

Fifteenth Item Set - Findings . --The scaling analysis of the 20 items 
showed a reliability for the disadvantaged sample of .818 with 95 per- 
cent confidence limits of .855 and .777. The reliability of these 
items for the advantaged sample was .830 with 95 percent confidence 
limits of .863 and .793. The number of items meeting the model fit 
criterion was l 1 ! for disadvantaged and 13 for advantaged children. Of 
these items, 11 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on all 
items showed statistical significance (z = 2.58) in favor of the ad- 
vantaged group. 

The 11 commonly fitting items were analyzed separately for the two 
groups, and showed a reliability for the disadvantaged group sample of 
.690 with 95 percent confidence limits of .754 and .618. The reli- 
ability of these items for the advantaged group sample was .771, with 
95 percent confidence limits of .816 and .720. Adjusted to a base of 
50 items, these reliabilities were, respectively, .910 with 95 percent 
confidence intervals of .928 and .890; and .939 with 95 percent confi- 
dence intervals of .950 and .925. 

A comparison of the raw score means of the two groups based on the 
items showed statistical significance (z = 2.68) in favo- of the ad- 
vantaged group. 

O 
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Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for the disadvantaged group was less than 
.70, namely, .618, the items were not analyzed by combining the two 

groups into one. 

The 19 items which met the model fit criterion at the fxrst 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .781 with 95 percent confidence limits of .825 
and .731. The item difficulty indices showed a range from 1 percent 
to 99 percent with a median value of approximately 24 percent. Adjusted 
to a 50 item base the reliability was .927 with 95 percent confidence 
limits of .942 and .911. 

Fifteenth Item Set - Conclusions . --The reliability estimate for the 
common items for the disadvantaged group was too small to justify use 

of the items in a common analysis. 

The 14 items that met the model fit criterion for the disadvantaged 

group at the first analysis were reanalyzed for that group only and pro- 
duced a reliability coefficient with a lower 95 percent confidence limit 
of .731. Because this coefficient was greater than .70, interval 
conversions were made for the disadvantaged group. The range of the 
item difficulties was good, but the distribution tended to the difficult 

end of the range. 

The data indicate that the advantaged children outperform those of 
the disadvantaged group to a significant extent but not as much as is 
typical of other item sets. This fact is true whether the comparison 
is based upon the means of the original 20 items or upon the means of 
the 11 items that fit the scaling model for both groups. 

O 
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Sixteenth Item Set - Description: Stanf o r d-Binet Intelligence Scal e, 

Form L-M . --The Stanf ord-Binet Intelligence Scale, Form L-M consists of 
items that represent a heterogeneous set of tasks. For the purposes 
of the present study items ranging from year II to year VII, inclu 
sively served as the basis for testing. The tasks these items repre- 
sent are verbal, non-verbal and manipulative. Examples of verbal item 
types are vocabulary, similarity and differences, comprehension, etc. 
Non-verbal items include delayed memory for objects and pictures, 
identification of objects by use, visual discrimination of similar 
pictures, etc. Manipulative items include button sorting, paper 
folding, maze tracing and the like. Cultural bias is probably a factor 
affecting the scores on these items because of the verbal emphasis and 
type of content that the items represent ♦ 

The particular way in which the administration of the tests in 
this study was accomplished resulted in a total numb e - of items that 
exceeds the number indicated in the standard version of the Binet. 

For example, items that normally require fewer correct responses for 
credit than the number of stimuli were administered in their entirety 
in each case and were scored as if each stimulus was a separate item. 
Hence, the total number of items associated with this test in this 
study is 216. 

Because the capacity of the scaling program did not permit the 
analysis of more than ninety-nine items at a time, a division of the 
Binet items into subgroups was necessary. The item set currently under 
consideration consists of items derived from Binet items II-l through 
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Sixteenth Item Set - Findings . —The scaling analysis of the 99 items 
showed a reliability for the disadvantaged sample of .920 with 95 per- 
cent confidence limits of .928 and .911. The reliability of these items 
for the advantaged sample was .804 with 95 percent confidence limits of 
.823 and .782. The number of items meeting the model fit criterion was 
75 for disadvantaged and 9 for advantaged children. Of these items, 
none was judged to fit the model for both groups; hence, no further 
analysis was performed with the advantaged group. 

A comparison of the raw score means of the two groups based on 
all items shewed high statistical significance (z - 19.71) in favor 
of the advantaged group. 

The 75 items which met the model fit criterion at the first scaling 
analysis for the disadvantaged sample were reanalyzed and showed a reli- 
ability of .904 with 95 percent confidence limits of .915 and .893. The 
item difficulty indices showed a range from 34 percent to 100 percent 
with a median value of approximately 95 percent. Adjusted to a 50 item 
base, the reliability was .863 with 95 percent confidence limits of 
.878 and .847. 

Sixteenth Item Set - Conclusions . — Because no items were commonly re- 
tained for the advantaged and disadvantaged groups, there was no indi- 
cation that the two populations develop in the same order the compe- 
tencies measured by these items; further, no additional analyses were 

performed for the advantaged group. 

The 75 items that met the model fit criterion for the disad- 
vantaged group at the first analysis were reanalyzed for that group 
only and produced a reliability coefficient with a lower 95 percent 
confidence limit of . 893 ^ Because this coefficient was greater than 
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.70, interval scale conversions were made for the disadvantaged group. 
The range and distribution of the item difficulties were poor, the 
distribution tending to the easy end of the range. 

Based on the original 99 items, the data indicate that the ad- 
vantaged children outperform those of the disadvantaged group to a 
very great extent. No further comparisons were possible. 

Seventeenth Item Set — Description: WPPSI Information . — The Information 

test from the WPPSI consists of 23 items intended to be arranged from 
easiest to most difficult.. The test includes items such as: 

1. Show me your nose. Touch it. 

12. What do you need to put two pieces of wood together? 

23. Where does the sun set? 

These items are intended to tap the subjects general range of 
information. All of the items seem to require the type of knowledge 
that an average individual with average opportunities might be able to 
acquire for himself. Specialized and academic knowledge is avoided 
but the effects of formal schooling may be influential. Knowledge of 
this type does seem to presuppose normal opportunity to receive vernal 
information and, as such, this would appear to be a poor test for 
people from deprived experiential backgrounds or people with a foreign 
language handicap. 

Seventeenth Item Set - Findings . --The scaling analysis of the 23 items 
showed a reliability for the disadvantaged sample of .846 with 95 per- 
cent confidence limits of .863 and .828. The reliability of these 
items for the advantaged sample was .785 with 95 percent confidence 
limits of .806 and .762. The number of items meeting the model fit 
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criterion was 15 for disadvantaged- and 13 for advantaged children. Of 
these items, 10 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 23.42) in favor 
of the advantaged group. 

The 10 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group sample 
of .751 with 95 percent confidence limits of .779 and .721. The reli- 
ability of these items for the advantaged group sample was .653, with 
95 percent confidence limits of .689 and .614. Adjusted to a base of 
50 items, these reliabilities were, respectively, .938 with 95 percent 
confidence intervals of .945 and .931; and .904 with 95 percent confi- 
dence intervals of .914 and .894. 

A comparison of the raw score means of the two groups based on the 
items showed high statistical significance (z — 20.79) in favor of the 
advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for the advantaged group was less than .70, 
namely, .614, the items were not analyzed by combining the two groups 

into one. 

The 15 items wh_ch met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .799 with 95 percent confidence limits of .821 
and .776. The item difficulty indices showed a range from 1 percent 
to 95 percent with a median value of approximately 53 percent. Adjusted 
to a 50 item base the reliability was .930 with 95 percent confidence 
limits of .937 and .922. , 
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Seventeenth Item Set - Conclusions . — The reliability estimate for 
the common items for the advantaged group was too small to justify 
use of the items in a common analysis. 

The 15 items that met the model fit criterion for the disadvantaged 
group at the first analysis were reanalyzed for that group only and pro- 
duced a reliability coefficient with a lower 95 percent confidence limit 
of .776. Because this coefficient was greater than .70, interval scale 
conversions were made for the disadvantaged group. The range and dis- 
tribution of the item difficulties were good. 

The data indicate that the advantaged children outperform those 

i 

of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 23 items 
or upon the means of the 10 items that fit the scaling model for both 
groups. 

Eighteenth Item -et - Description: WPPSI Vocabulary . — The WPPSI Vocabu- 

lary test consists of a list of 22 words arranged in order of diffi- 
culty from easiest to most difficult. Examples of this range of diffi- 
culty are as follows: 

1. Shoe 
11. Castle 

22. Gamble 

This test calls for the definition of words. In general, any 
recognized meaning of the word is acceptable, disregarding elegance 
of expression. Poverty of content is penalized, however. Thus, the 
results are necessarily influenced by the subject's cultural and edu- 
cational background. Since each word is read to the subject the effects 
of reading difficulties should be minimized. 




Eighteenth Item Set - Findings . —The scaling analysis of the 22 items 
showed a reliability for the disadvantaged sample of .803 with 95 per- 
cent confidence limits of .824 and .781. The reliability of these 
items for the advantaged sample was .779 with 95 percent confidence 
limits of .801 and .756. The number of items meeting the model fit 
criterion was 18 for disadvantaged and 16 for advantaged children. 

Of these items, 13 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed high statistical significance ( z = 30.87) in favor 
of the advantaged group. 

The 13 commonly fitting items were analyzed separately for the 
two groros, and showed a reliability for the disadvantaged group 
sample of .662 with 95 percent confidence limits of .699 and .623. 

The reliability ox these items for the advantaged group sample was 
.620, with 95 percent confidence limits of .659 and .579. Adjusted 
to a base of 50 items, these reliabilities were, respectively, .883 
with 95 percent confidence intervals of .895 and .870; and .863 with 
95 percent confidence intervals of .876 and .848. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z — 2o.21) in favor 
of the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for both groups was less than .70, namely, 
.623 and .579, respectively, the items were not analyzed by combining 
the two groups into one. 

The 18 items which met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
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showed a reliability of .754 with 95 percent confidence limits of 
.790 and .737. The item difficulty indices showed a range from 2 
percent to 98 percent with a median value of approximately 22 per- 
cent. Adjusted to a 50 item base the reliability was .900 with 95 
percent confidence limits of .911 and .889. 

Eighteenth Item Set - Conclusions . --The reliability estimates for 
the common items for both the advantaged and the disadvantaged groups 
were too small to justify use of the items in a common analysis. 

The 18 items that met the model fit criterion for the disad- 
vantaged group at the first analysis were reanalyzed for that group 
only and produced a reliability coefficient with a lower 95 percent 
confidence limit of .737. Because this coefficient was greater than 
.70, interval scale conversions were made for the disadvantaged group. 
The range of the item difficulties was good, but the distribution 
tended to the difficult end of the range. 

Nineteenth Item Set - Description: WPPSI Arithmetic . --The WPPSI Arith- 

metic test consists of 20 items arranged in order of difficulty from 
easiest to hardest. Examples illustrating this range are as follows: 

1. (Consists of a large card with three different 
size balls on it - child must point to largest.) 

10. Harry had 2 pennies and his daddy gave him 1 more. 

How many did he have altogether? 

20. James had 8 marbles and he bought 6 more. How many 
marbles did he have? 

The first four items of the test use cards printed with pictures 
of various objects. These were designed to measure basic quantitative 
concepts without involving the explicit use of numbers. The remaining 
sixteen items touch upon commonplace situations and involve simple 
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calculations . 



While the computational skills required to solve 

the problems are not beyond those taught in the first grade, the 

test is obviously heavily influenced by formal schooling, i.e., 

✓ 

kindergarten or first grade experience. Each item is read to the 
child, however, which avoids the need for verbalization on his part 
and largely eliminates the effects of reading difficulties. 

Nineteenth Item Set - Findings . The scaling analysis of the 20 items 
showed a reliability for the disadvantaged sample of .807 with 95 
percent confidence limits of .828 and .785. The reliability of these 
items for the advantaged sample was .844 with 95 percent confidence 
limits of .860 and .827. The number of items meeting the model fit 
criterion was 1? for disadvantaged and 9 for advantaged children. 

Of these items, 6 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 21.00} in favor 
of the advantaged group. 

The 6 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group 
sample of .604 with 95 percent confidence limits of .650 and .555. 

The reliability of these items for the advantaged group sample was 
-.380, with 95 percent confidence limits that are meaningless. Ad- 
justed to a base of 50 items, the reliability for the disadvantaged 
group was .927 with 95 percent confidence intervals of .935 and .919. 
The adjustment was not made for the advantaged group. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 16.22} in favor 
of the advantaged group. 



-77- 



■ Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for both groups was less than .70, namely, 
.555 and undetermined, respectively, and since the lower limit of the 
95 percent confidence interval of the easiness parameter correlation 
was less than .80, namely, -.660, the items were not analyzed by com- 
bining the two groups into one. 

The 12 items which met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .732 with 95 percent confidence limits of 
.762 and .701. The item difficulty indices showed a range from 1 per- 
cent to 97 percent with a median value of approximately 53 percent. 
Adjusted to a 50 item base the reliability was .919 with 95 percent 
confidence limits of .928 and .910. 

Nineteenth Item Set - Conclusions . — The correlation between the 6 pairs 
of item easiness parameter estimates derived from disadvantaged and ad- 
vantaged children was small enough to cast doubt on the contention that 
the two populations develop in the same order the competencies measured 
by these items. Also the reliability estimates for the common items 
for both the disadvantaged and the advantaged groups were too small to 
justify use of the items in a common analysis. 

The 12 items that met the model fit criterion for the disadvantaged 



group at the first analysis were reanalyzed for that group only and pro- 
duced a reliability coefficient with a lower 95 percent confidence limit 
of .701. Because this coefficient was greater than .70, interval scale 
conversions were made for the disadvantaged group. The range and dis- 
tribution of the item difficulties were good. 
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Tha data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is 
true whether the comparison is based upon the means of the original 
20 items or upon the means of the 6 items that fit the scaling model 
for both groups. 

Twentieth Item Set - Description: Arthur Adaptatio n of the Leiter 

International Performance Scale . --The Arthur Adaptation of the Leiter 
International Performance Scale is a non-verbal test that requires 
the subject to place blocks into the stalls of a wooden frame. The 
correct placement of the blocks requires the subject to examine the 
pictures, patterns, or colors that are printed on a cardboard strip 
that is placed on the frame above the stalls. The first item in the 
test is at a two-year level and requires the subject to match five 
blocks of different colors with the colored squares printed on the 
strip and place the blocks into the corresponding stalls of the frame, 
fj-jg items progress in difficulty and include tasks involving block de 
sign, picture completion, number discrimination, form-color, form- 
color-number, genus determination, analogous progression of forms, 
pattern completion, coding and recognition of age differences. The 
items used in the present study were those of the year two level 
through the year seven level. 

The tasks that these items represent are omnibus in character, 
much like the items of the Stanford-Binet and other similar tests that 
are varied in content and concept. Some cultural bias may be present 
in the items that include pictures of persons and objects, but most of 
the items deal with colors, shapes and forms and patterns. 




-79- 



Twentieth Item Set - Findings . --The scaling analysis of the 27 items 
showed a reliability for the disadvantaged sample of .804 with 95 
percent confidence limits of .846 and .757. The reliability of these 
items for the advantaged sample was .740 with 95 percent confidence 
limits of .805 and .665. The number of items meeting the model fit 
criterion was 17 for disadvantaged and 10 for advantaged children. 

Of these items, 10 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed high statistical significance (z = 10.98) in favor 
of the advantaged group. 

The 10 commonly fitting items were analyzed separately for the 
two groups, and showed a' reliability for the disadvantaged group 
sample of .759 with 95 percent confidence limits of .814 and .695. 

The reliability of these items for the advantaged group sample was 
.611, with 95 percent confidence limits of .732 and .463. Adjusted 
to a base of 50 items, these reliabilities were, respectively, .940 
with 95 percent confidence intervals of .954 and .925; and .887 with 
95 percent confidence intervals of .921 and .847. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 10.49) in favor 
of the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for both groups was less than .70, namely, 
.695 and .463, respectively, and since the lower limit of the 95 percent 
confidence interval of the easiness parameter correlation was less than 
.80, namely, .649, the items were not analyzed by combining the two 
k groups into one. 
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The 17 items which met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .779 with 95 percent confidence limits of 
.828 and .723. The item difficulty indices showed a range from 9 
percent to 99 percent with a median value of approximately 86 percent. 
Adjusted to a 50 item base the reliability was .912 with 95 percent 
confidence limits of .931 and .890. 

Twentieth Item Set - Conclusions . — The correlation between the 10 
pairs of item easiness parameter estimates derived from advantaged 
and disadvantaged children was small enough to cast doubt on the con- 
tention that the two populations develop in the same order the compe- 
tencies measured by these items. Also the reliability estimates for 
the common items for both the advantaged and the disadvantaged groups 
were too small to justify use of the items in a common analysis. 

The 17 items that met the model fit criterion for the disadvan- 
taged group at the first analysis were reanalyzed for that group only 
and produced a reliability coefficient with a lower 95 percent confi- 
dence limit of .723. Because this coefficient was greater than .70, 
interval scale conversions were made for the disadvantaged group. The 
range of the item difficulties was good, but the distribution tended 
to the easy end of the range. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
whether the comparison is based upon the means of the original 27 
items or upon the means of the 10 items that fit the scaling model for 

both groups. 
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Twenty-First Item Set - Description: Merrill-Palmer Scale of Mental 

Tests. --The Merrill-Palmer Scale of Mental Tests consists of items 
that are both verbal and non-verbal. The verbal items include simple 
questions-- TT What does a doggie say?" "What is your name?"--action 
agents — "What sleeps?" "What scratches?"--and repetition of words. 

The non-verbal items include obeying simple commands, standing on 
one foot, cutting with scissors, copying a star, form boards and 
picture puzzles as well as boards. 

For the purposes of analysis the items of the Merrill-Palmer were 
grouped into three sets. The item set under consideration here was 
labeled "information" and consisted of the following items: the simple 

questions, the action agents and the identification of one's self in a 
mirror. There were thirty-one such items. 

Twenty-First Item Set - Findings . --The scaling analysis of the 31 items 
shpwed a reliability for the disadvantaged sample of .822 with 95 per- 
cent confidence limits of .859 and .781. The reliability of these 
items for the advantaged sample was .617 with 95 percent confidence 
limits of .700 and .524. The number of items meeting the model fit 
criterion was 22 for disadvantaged and 10 for advantaged children. Of 
these items, none were judged to fit the model for both group?; and, 

hence, no common analysis was possible. 

A comparison of the raw score means of the two groups based on all 
items showed high statistical significance (z = 15.88) in favor of the 




advantaged group. 

The 22 items which met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .804 with 95 percent confidence limits of .845 
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and .758, The item difficulty indices showed a range from 21 per- 
cent to 99 percent with a median value of approximately 89 percent. 
Adjusted to a 50 item base the reliability was .903 with 95 percent 
confidence limits of .923 and .881. 

Tw enty-First Item Set - Conclusions . — No common analysis was possib ■- 
for the two groups. 

The 22 items that met the model fit criterion for the disad- 
vantaged group at the first analysis were reanalyzed for that group 
only and produced a reliability coefficient with a lower 95 percent 
confidence limit of .758. Because this coefficient was greater than 
,70, interval scale conversions were made for the disadvantaged group. 
The range of the item difficulties was good, but the distribution 
tended to the easy end of the range. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
when the comparison is based upon the means of the original 31 items. 
Twenty-Second Item Set - Description: Oseretsky Tests of Mot or Pro- 

ficiency . --The Oseretsky Tests of Motor Proficiency were designed for 
use with children from four to sixteen years of age. In the present 
study the items from the four-year through the seven-year level were 
utilized. Each year level consists of six items, each item repre- 
senting a different type of motor proficiency. The items of the five- 
year level and the type each represents are: stand in upright posi- 

tion on tip-toe with eyes open for ten seconds (static coordination) ; 
hop on one foot for a distance of six feet with eyes open (dynamic 
coordination) ; form a small by rolling up a small square of thin 
paper with the fingers of one hand (dynamic coordination of the hands) ; 
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roll a thread on a spool in a specified time (motor speed) ; put 
matchsticks into a box using both hands (simultaneous voluntary 

a> 

movements) ; and clench teeth and show them by parting the lips 
(associated involuntary movements, i.e., ability to perform without 
superfluous movements) . 

Twenty-Second Item Set - Findings . — The scaling analysis of the 25 
items showed a reliability for the disadvantaged sample of .773 with 
95 percent confidence limits of .820 and .720. The reliability of 
these items for the advantaged sample was .74-3 with 95 percent confi- 
dence limits of .794- and .686. The timber of items meeting the model 
fit criterion was 19 for disadvantaged and 18 for advantaged children. 
Of these items, 14- were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on 
all items showed no statistical significance (z = .4-6). 

The 14- commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group 
sample of .630 with 95 percent confidence limits of .710 and .54-0. 

The reliability of these items for the advantaged group sample was 
.613 with 95 percent confidence limits of .692 and .524-. Adjusted 
to a base of 50 items, these reliabilities were, respectively, .859 
with 95 percent confidence intervals of .889 and .825; and .850 with 
95 percent confidence intervals of .880 and .816. 

A comparison of the raw score means of the two groups based on 
the items showed no statistical significance (z = . 80) . 
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Sxnce the lower limit of the 95 percent confidence interval of 
the reliability coefficient for both groups was less than .70, namely, 
.540 and .524 respectively, and since the lower limit of the 95 per- 
cent confidence interval of the easiness parameter correlation as 
less than .80, namely, .616, the items were not analyzed by combining 

the two groups into one. 

The 19 items which met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .763 with 95 percent confidence limits of 
.813 and .707. The item difficulty indices showed a range from 9 per- 
cent to 98 percent with a median value of approximately 82 percent. 
Adjusted to a 50 item base the reliability was .894 with 95 percent 
confidence limits of .916 and .870. 

Twenty-Second Item Set - Conclusions . —The correlation between the 14 
pairs of item easiness parameter estimates derived from advantaged 
and disadvantaged children was small enough to cast doubt on the con- 
tention that the two populations develop in the same order the compe- 




tencies measured by these items. Also the reliability estimates for 
the common items for both the advantaged and the disadvantaged groups 
were too small to justify use of the items in a common analysis. 

The 19 items that met the model fit criterion for the disad- 
vantaged group at the first analysis were reanalyzed for that group 
only and produced a reliability coefficient with a lower 95 percent 
confidence limit of .707. Because this coefficient was greater than 
.70, interval scale conversions were made for the disadvantaged group. 
The range of the item difficulties was good, but the distribution 

b 

tended to the easy end of the range. 
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Peabody Picture Vocabulary Test.-- 



Twenty-Third Item Set - Description: 

The Peabody Picture Vocabulary Test consists of 150 plates, each of 
which presents four pictures. The subject responds to each plate by 
pointing to the picture that he thinks represents the word that the 
examiner has pronounced. Some of the words in the test label objects, 
some label actions and some label concepts. The items are arranged 
in increasing order of difficulty, and the first eighty items were 
used in the present study. The vocabulary words, from easy to diffi- 
cult, are represented by the following: table, climbing, snake, 

temperature , locomotive and autumn* 

Twenty-Third Item Set - Findings .— The scaling analysis of the 80 items 
showed a reliability for the disadvantaged sample of .881 with 95 per- 
cent confidence limits of .906 and .853. The reliability of these 
items for the advantaged sample was .782 with 95 percent confidence 
limits of .825 and .734. The number of items meeting the model fit 
criterion was 49 for disadvantaged and 54 for advantaged children. Of 
these items, 35 were judged to fit the model for both groups. 

A comparison of the raw score means of the two groups based on all 
items showed high statistical significance (z = 12.79) in favor of the 
advantaged group. 

The 35 commonly fitting items were analyzed separately for the 
two groups, and showed a reliability for the disadvantaged group sample 
of .801 with 95 percent confidence limits of .842 and .755. The reli- 
ability of these items for the advantaged group sample was .722, with 
95 percent confidence limits of .777 and .660. Adjusted to a base of 
50 items, these reliabilities 'ere, respectively, .852 with 95 percent 



confidence intervals of .883 and .817; and .788 with 95 percent confi- 
dence intervals of .830 and .740. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z - 13.11) in favor 
of the advantaged group. 

Since the lower limit of the 95 percent confidence interval of 
the reliability coefficient for the advantaged group was less than 
.70, namely, .660, the items were not analyzed by combining the two 
groups into one. 

The 49 items which met the model fit criterion at the first 
scaling analysis for the disadvantaged sample were reanalyzed and 
showed a reliability of .835 with 95 percent confidence limits of 
.869 and .797. The item difficulty indices showed a range from 9 per- 
cent to 99 percent with a median value of approximately 66 percent. 
Adjusted to a 50 item base, the reliability was .838 with 95 percent 
confidence limits of .871 and .800. 

Twenty-Third Item Set - Conclusions . --The reliability estimate for the 
common items for the advantaged group was too small to justify use of 
the items in a common analysis. 

The 49 items that met the model fit criterion for the disadvan- 
taged group at the first analysis were reanalyzed for that group only 
and produced a reliability coefficient with a lower 95 percent confi- 
dence limit of .797. Because this coefficient was greater than .70, 
interval scale conversions were made for the disadvantaged group. The 
range and distribution of the item difficulties were good. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a very great extent. This fact is true 
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whether the comparison is based upon the means of the original 80 
items or upon the means of the 35 items that fit the scaling model 

for both groups. 

Twenty-Fourth Item Set - Description: Let T s Look at First Grader s 

f Shapes and Forms) .--A set of instructional materials was developed 
in 1965 for the Board of Education of the City of New York. Included 
in these materials was a series of pseudo-tests that were to be used 
instructionally . These materials encompassed spatial relations, shapes 
and forms, communication skills, time concepts arithmetic, and reason- 
ing, each of these categories comprising six exercises arranged in in- 
creasing order of difficulty. For the purpose of the present study, 
instructions were written so that these materials coulu be given as 
tests and the several different exercises were dispersed through the 
four batteries of items that were administered to the subjects in the 
study. The particular item set under consideration at this point con- 
sisted of exercise numbers one, three and five in the shapes and forms 
category. The items in the first exercise presented the subject with 
a shape or form such as a triangle and required him to select from 
three alternatives a form of the same type, but smaller size, that 



might or might not be inverted or rotated. The items of the fifth 
exercise consisted of the same general type of task, but the subject 
was required to make more sophisticated discriminations that might in- 
clude shading as well as form. The items of exercise three were of 
moderate difficulty. None of these items, however, was very complex. 
Twenty-Fourth Item Set - Findings . —The scaling analysis of the 26 items 
showed a reliability for the disadvantaged sample of .833 with 95 per- 

O , confidence limits of .873 and .788. The reliability of these items 
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for the advantaged sample was .506 with 95 percent confidence limits 
of .605 and .395. The number of items meeting the model fit criterion 
was 19 for disadvantaged and 13 for advantaged children. Of these 
items 9 were judged to fit the model for both groups. 

A comparison of the raw. score means of the two groups based on 
all items showed high statistical significance (z = 4.10) in favor of 

the advantaged group. / 

The 9 commonly fitting items were analyzed separately for the two 

groups, and showed a reliability for the disadvantaged group sample of 
.557 with 95 percent confidence limits of .703 and . 37'l. The reli- 
ability of these items for the advantaged group sample was -.268, wxth 
95 percent confidence limits meaninglessly low. Adjusted to a base of 
50 items, the reliability for the disadvantaged group wns .875 with 95 
percent confidence intervals of .915 and .827. The reliability for 

the advantaged group was not adjusted. 

A comparison of the raw score means of the two groups based on 
the items showed high statistical significance (z = 4.681 in favor of 
the advantaged group. 

The reliabilities were too small to justify a combined analysis. 

The 19 items which met the model fit criterion at the first scaling 
analysis for the disadvantaged sample were reanalyzed and showed a reli- 
ability of .785 with 95 percent confidence limits of .844 and .715. 

The item difficulty indices showed a range from 39 percent to 99 per- 
cent with a median value of approximately 87 percent. Adjusted to a 
50 item base the reliability was .906 with 95 percent confidence limits 

of .931 and .876. 
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Twenty-Fourth Item Set - Conclusions . --The reliability estimates for 
the common items for both the disadvantaged and the advantaged groups 
were too small to justify use of the items in a common analysis. 

The 19 items that met the model fit criterion for the disadvan- 
taged group at the first analysis were reanalyzed for that group only 
and produced a reliability coefficient with a lower 95 percent confi- 
dence limit of .715. Because this coefficient was greater than .70, 
interval scale conversions were made for the disadvantaged group. 

The distribution of the item difficulties tended to the easy end of 
the range. 

The data indicate that the advantaged children outperform those 
of the disadvantaged group to a significant extent. This fact is true 
whether the comparison is based upon the means of the original 26 
items or upon the means of the 9 items that fit the scaling model for 
both groups. 
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PART VI 



CONCLUSIONS AND IMPLICATIONS 



The purpose of the investigation was to identify scales which 
would be descriptive of the development of problem-solving skills 
in young children. These scales were sought in answer to the follow- 
ing question: Do children of different backgrounds exhibit simi- 

larities in the order of development and levels of achievement of 
problem-solving behaviors? Collection and analysis of the data have 
led the present investigators to reach several conclusions concern- 
ing the answer to this question, as well as several conclusions con- 
cerning the research methods necessary in this type of Investigation. 
These conclusions with some of their implications folxcw. 

The first, and perhaps most significant, conclusion reached in 
the investigation is that there are problem-solving skills that de- 
velop in the same order among children of extremely different back- 
grounds . The nine sets of Items which were found to scale in the 
same way for both advantaged and disadvantaged children are empirical 
evidence of this phenomenon. Of course, it is necessary to immediately 
qualify the conclusion on the basis of the characteristics of the 
sample. Only children four through six years old were tested. 

There seems little reason, however, to suspect that the generaliz- 
ation would not hold for younger and older children. Next, only 
advantaged and disadvantaged children were included. The investi- 
gators selected these two groups in order to maximize differences 
in children. By choosing children of such different socio-economic 




- 90-91 



- 91 - 



classes, it was expected that extremes in opportunities to learn 
and in levels of achievement would be obtained. (The fact that 
the advantaged children performed significantly better than the 
disadvantaged children on each of the nine item sets is evidence 
of considerable differences in levels of achievement.) If item 
sets could be found that scaled in the same way for such different 
groups, the present investigators argued, then it would be reason- 
able to hypothesize that the scales were ,T universal . " Naturally 
it xvill be necessary to test this hypothesis through the validation 
of the nine item sets with other groups of children. 

Now what are the immediate implications of the fact that nine 
scales have been identified that reflect a common ordering of problem- 
solving skills in children of quite different backgrounds? Apparently, 
there is sufficient commonality of developmental sequencing to make 
possible a certain amount of "culture constant" assessment of young chil- 
dren in the nation. Although some caution must be exercised in in- 
terpreting these scales as representing developmental patterns along 
discrete underlying continua (many items which did not scale for the 
two groups appeared logically to be very much like items that did 
scale for the two groups) , the common scales would seem to have im- 
mediate implications for mental measurement, educational programs 
for young children, and perhaps even theory construction. 

With respect to mental measurement, the nine sets of items ob- 
viously constitute an excellent starting point for the construction 
of instruments which can be used across sub-populations of the country. 
Aside from the fact that these item sets might be used in their present 
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f orm (after validation, of course, and with the special instruc- 
tions developed for each, e.g., "Maximum Performance Testing"), 
they certainly would provide researchers with the kinds of tasks 
which might be expected to develop in some common order among chil- 
dren. The reader should be reminded that these item sets met speci- 
fied reliability criteria and produced measurements of interval 
scale quality. 

As to the development of educational programs for young chil- 
dren, the nine common scales can serve as guides for the sequencing 
of curricula. Although certainly not comprehensive in any sense 
at the present stage of the research, these item sets can provide 
curriculum planners with what might be important insights into the 
order in which children develop various skills. A salient point 
here is that the sequencing was the same for both groups on these 
item sets but the advantaged children (as a group) were always further 
along the scale. In other words, the disadvantaged children learn 
the same things as the advantaged children and in the same order; 
they simply take longer to do it. Here the problem is not whether 
to require disadvantaged children to learn the same things as ad- 
vantaged children or to teach a middle class culture to lower class 
children. The point is that both groups develop these skills and 
the difference is in the distance that they have moved along the 
scale. Therefore the problem for curriculum planners is how to 
accelerate disadvantaged children along these continua. The reader 
will recognize that these scales are related to instructional goals, 
not to the methodologies by which these goals might be achieved. 
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A note concerning the relationship of the conclusion to theory 
is required. In the first place, it must be made clear that the 
present investigators do not contend that the nine scales neces- 
sarily represent true developmental sequencing. They do contend, 
however, that these item sets reflect readiness patterns to the 
extent that they scale in the same way across subpopulations or 
subcultures. Presumably each item is only a sample of a category 
of items that reflect a position on a continuum. Further research 
is needed to define these more exactly. Thus, the theorist must 
take care that he understands what these scales represent befpre 
applying them to theory validation or modification. 

In addition, it must be pointed out that the research reported 
here was not designed in the framework of a theory nor to test hy- 
potheses emanating from a theory. In Part III an inductive research 
context was presented but this amounted to little more than a general 
view of readiness based on assumptions to be tested in the investi- 
gation. Therefore, the contention that the scales generated are 
consistent or inconsistent with some theory or theories must be 
made most tentatively. 

A second conclusion reached in the investigation is that there 
are particular problem-solving skill s that develop in a particular 
order for disadvantaged children which are different from the develop- 
ment of these skills in advantaged children. Fifteen item sets were 
generated in the study which were ordered for disadvantaged children 
but not ordered in the same way for advantaged children, considering 
the reliability criteria. While it is true that the advantaged chil- 
dren outperformed the disadvantaged children on all but one of the 
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fifteen item sets (the Oseretsky , consisting of motor performance 
items), the items did not scale in the same way. 

Perhaps the most immediately useful implication of the finding 
is in the field of testing disadvantaged children. Given the unique 
form of testing employed and the scaling technique used to construct 
interval scales, it is possible that the achievement of disadvantaged 
children can be much more adequately determined. Apparently edu- 
cators and researchers may be attempting to measure problem-solving 
status and growth with instruments that do not take into consider- 
ation those developmental patterns unique to disadvantaged children. 

In some contrast to the "culture constant" testing mentioned above, 
it seems likely that tests can be developed that are "culture biased’ 
in order to determine more sensitively the progress of children in 
certain subpopulations. 

In the area of curriculum development, the identification of 
the fifteen scales unique to the disadvantaged child is more specu- 
lative with respect to implications. Comparisons of the two groups’ 
performance on items that do not scale commonly might be useful in 
identifying areas in which disadvantaged students require particular 
instruction. Of course this approach would be based on the fact that 
advantaged children are more successful in school than disadvantaged 
children and on the assumption that the present expectations of schools 
are appropriate for all children. This becomes a philosophical con- 
sideration immediately which must be resolved by the society and not 
by research findings. Nevertheless, the study of empirically-derived 
scales revealing differences in levels of achievement between the 
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advantaged and the disadvantaged child and in the order in which 
problem-solving skills develop must lead to significant insights 
into curriculum improvement. f 

A third conclusion forced by the data is that many item sets 
did not scale with sufficient reliability for the disadvantaged 
children after the scaling analysis had eliminated items that did 
not meet the model fit criteria. In the case of thirty- two item 
sets, the reliability lower limits were only sufficient when each 
item set was adjusted to a fifty item base. Seven other sets were 
not sufficiently reliable even when the projected adjustment was 
made. Eight more item sets contained too few items after scaling 
to allow any further analysis. 

The implications of these item sets failing to meet the estab- 
lished criteria for further study lack definitiveness; nevertheless, 
a study of the data does yield a suggestion. In the first place, 
the item sets must be viewed as insufficient in the present context 
to indicate useful sequencing or achievement levels of either the 
combined groups or the disadvantaged group alone. On the other hand, 
the easiness parameter intercorrelations of some item sets indicate 
that both groups may be following a common general sequence of skill 
development but that the reliability associated with the scores is 
not of the quality required. If this is true, these item sets, par- 
ticularly those in the third grouping which did have sufficient re- 
liability estimates for the disadvantaged group when projected to 
a fifty-item base (thirty- two sets), also may be a fruitful starting 
place for the development of new instruments. 
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In Part III of the present report, the existence of readiness 
behaviors was defined as being units of behavior that were learned 
or performed by individuals prior to other units of behavior. Further, 
it was suggested that these behaviors were sequenced in particular 
ways for one of at least three reasons; the ordering was inherent 
in the organism, the order was inherent in the skills themselves, 
or the order represented the sequencing of experiences within the 
culture. The latter, of course, means simply that the society 
generally provides the child with the opportunities to learn one 
unit of behavior before another so consistently that the sequences 
are definite and discernible. 

As one looks at the groupings of the item sets, he sees some 
that scale for both groups of children, some that scale reliably 
for the disadvantaged only and some that do not scale for the dis- 
advantaged in a reliable way. Is there some general conclusion to 
be drawn from the differences in scalir.g parameters and reliability 
estimates for the five groups of item sets? Any such conclusion 
must be tentative indeed. Nevertheless, inspection of the data 
does seem to suggest that all tests of problem-solving abilities 
must be to some extent experience-specific. That is to say that 
the tests must be based to a lesser or greater degree on the specific 
experiences of the children for which it was designed. As tests are 
based more and more on experiences that are common to all children, 
the probability that the tests will tend to scale similarly across 
subpopulations increases. As particular item sets are based more 
on experiences unique to certain groups of children, the less effective 
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they may be for use with other groups of children who have not had 
such experiences. Thus, there is evidence of sequences of develop- 
ment that are based on the ordering of society’s experiences for 
children. It seems particularly important in measuring patterns 
of problem-solving development to consider the probability that the 
commonality of experiences of the children involved is a critical 
factor. The idea that children from one sort of background will 
not do as well on certain problems as children from a background 
in which experiences related to the problems have been encountered 
is not new. This idea is certainly supported by the data. But 
another and more important idea is also suggested and that is that 
the ordering of such skills may be different also, whatever the 
achievement levels of the two groups. 

Concerning the methodology employed in the study, several con- 
clusions may be drawn. First, there are a number of reasons related 
to the research methods that could account for an item set failing 
to be reliable or to sequence similarly for the two subpopulations. 
These include the following: measurement error associated with the 

respondents' guessing answers; possible differential effects of the 
"Maximum Performance Testing" approach on the two subpopulations; 
the difficulty characteristics of the items in a particular set (too 
hard, too easy) ; lack of stability of the item parameter estimates 
resulting from limited samples; and lack of differentiating ability 
of the item set because too few items remained after scaling. 

As to the testing procedures used ("Maximum Performance Test- 
ing") , these appear to be a tenable method for testing problem-solving 
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skills in young children. Scales were generated from the data ob- 
tained and communication difficulties between tester and subject 
appeared to be minimal. Presumably this form of testing could 
be used also to test for the mastery of various skills in situations 
other than sequencing problem-solving behaviors. 

The scaling techniques employed in the investigation seem to 
have potential for the identification of developmental sequences 
and perhaps in the eventual construction of developmental networks. 
Twenty-four scales with sufficient reliability were identified. 

This is indicative of the fact that the procedures were operating 
in a reasonable and expected manner. There are, however, refine- 
ments needed. Some of these are suggested in the following section 
under recommendations . 




PART VII 



RECOMMENDATIONS 



As stated earlier, the investigation reported in the present 
document is only a first step in the identification of a readiness 
network reflecting developmental patterns of problem-solving be- 
havior. Although there are immediate gains to be derived from the 
study, its chief contribution must be the providing of a basis for 
further study in the area of mental development in young children. 
Many recommendations might be made concerning studies that would 
extend the present work and concerning the improvement of the re- 
search tools employed in the study. The present investigators, 
however, have limited themselves to those recommendations which 
seem most cogent; these are enumerated below. 

1. The twenty-four scales generated in the present study 
should be validated with new samples. In the case of the nine item 
sets that scaled commonly for both groups, this validation should 
include divergent groups or subpopulations. In the case of the 
fifteen i.tem sets that scaled for the disadvantaged children only, 
the validation would be with other groups of disadvantaged children. 

2. In collecting validation data or in extending the study 
into other areas, items used in the present investigation should 
be screened and those which did not yield information (too easy or 
too hard) should be eliminated or the age of the sample children 
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should be modified so that the items would be more effective. 

3. Further studies of this type should include larger samples 
and fewer items should be administered to a particular sample. In 
addition to increasing stability of the item parameter estimates, 
this would allow factorial procedures to precede the scaling anal- 
ysis. This would provide for the simultaneous analysis of items 
from different tests and make possible the examination of the uni- 

f 

factor structure of an item set. 

4. The selection of items to be used in a set could be based 



on two criteria: (a) a logical consideration of what variable the 

underlying continuum represents and hence the combining of items 
from more than one of the current item sets and (b) an examination 
of the scaling data and the inclusion of only items which individually 
met the model fit criteria in the current analysis. 

3. More work is needed in determining the appropriateness of 
the scaling model and the statistical properties of the methods 
used in the present study. One aspect of this task would be the 
comparison of results for one parameter and two parameter solutions. 

6. When at all possible in this type of research, items should 
be used in a format which minimizes guessing. Otherwise a three 
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parameter model may be necessary for accurate description of the 
data. In the three parameter situation, computations become much 
more difficult. 

7. There is a need to characterize the tasks. In other words 
it is necessary to generalize if possible the items at all points 
on a scale to item types . A sequential testing approach could be 
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built around pools of items of each task type. These procedures 
would lead to a different definition of reliability, i.e., consistency 
of the estimates of a subject’s ability where additional items are 
given which maximize the information about a subject’s ability. The 
work done in the present study should be particularly useful to in- 
vestigators attempting to develop such testing models. 

8. The item sets generated in the present study which scale 
commonly for both groups and those that scale for the disadvantaged 
only should be used to determine if they are more effective in de- 
tecting changes which may result from various intervention programs. 

9. Using the data collected in the present study, additional 
analyses might be made upon different groupings of the subjects 
tested, e.g., sex, age, geographic region. 

10. Further work should be done in identifying common and unique 
problem-solving scales with younger children and older children than 
were used in the present study. 

11. Logical and analytical procedures for relating scales into 
networks should be developed. 

12. If the twenty-four scales presented in the present document 
are validated, the implications of them for curriculum development 
should be explored. 
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ITEM CLASSIFICATION 
OUTLINE 



I. PERFORMANCE - Ideally incxudes items that require motor skill 
and that are scored for motor coordination or level of physi- 
cal maturity only. 

1. Action Items - examples: jump! stand with your toes 

pointed out. - also includes items that require 
following directions - ex: put the pencil on the 

chair . 

2. Block Building - Ex: the child is asked to build a 

pyramid and has a model to go by. 

3. Object Assembly - This is not like the subtest object 
assembly on the Stanford-Binet which would fall under 
IV - 2 (Spatial, mazes and puzzles) on this classifi- 
cation. Object assembly here refers to stringing 

. beads and other similar items that emphasize manual 
dexterity. (ex: pegboards) 

4. Taxonomies - sorting tasks 

II. A. Verbal - includes items that require the child to speak 
and exhibit some verbal skill. Yes and No answers would 
not be included. 

1 . Vocabulary 

a) picture identification - items which require the 
child to attach a name and/or story to a picture. 

b) object identification - requires the child to 



attach a name to an object. 
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c) definition or word meaning - requires the child 
.to verbally define a word. 

d) talking - some tests include a very general score . 
on child 1 s chatter throughout the test. 

2 . Comprehension 

a) analogies - includes items which require the child 

to supply a missing word. Ex: Summer is hot; 

winter is . Though some of these may be 

opposites they are included. 

b) similarities and differences - items requiring 

child to explain how things are alike or dif- 
ferent. Ex: How are a peach and a ball alike? 

How are they different? 

c) interpretation - includes items that require a 
child to explain the meaning of a statement, 
proverb, etc. 

d) explanation - requires a child to explain or 

untangle a sentence or phrase. Ex: What f s 

foolish about this sentence? 

3. General Knowledge - Items asking for personal-social 
information (when is your birthday?) or well known 
events (what do we celebrate on the 4th of July?) or 
facts (what is the color of a ruby?) 

B. Non-Verba l - This category covers approximately the same 
areas as II -A (Verbal) but items included here generally 
do not require the child to speak. 
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1 . Vocabulary 

a) picture identification - items in which the 
tester gives a word and the child points to or 
marks the correct picture . 

b) object identification - same as above except the 
child chooses among objects placed before him. 

2. Comprehension - This is a broad category containing 
items that are intended to evaluate the child's 
understanding of a situation, picture, object, etc. 
Although he may be required to give a verbal answer 
to some of the items, these answers aren’t scored 
for the adequacy of vocabulary but conveyance of some 
central concept. This category also includes some 
items referring to time concepts, depending on the 
form of the item. 

a) picture stories - requires the child to indicate 
in some way what is happening in a picture . 

b) indicate use for ________ - includes items 

which present the child with an object or picture 
and requires him to indicate in some manner what 
one does with it. Ex’s: Item - a small cup; 

Response - child pretends to drink. Item - picture 
of a saw; Response - a sawing motion. 

3. Picture., Color or Object Recognition - This, too, is 
a broad category, including a wide range of items 
probably requiring a number of skills. First, items 
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which require the child to find a similarity or dif- 
ference in pictures or objects; this differs from 
taxonomical items (also falling in this category) in 
that it is more complex and requires more than simple 
grouping. Ex: Item - picture of large ship (find 

one like this) ; Response - child chooses among variety 
of objects a small peculiar boat. 

Taxonomies, here, include grouping by color, use, 
etc. This category also includes mutilated picture 
items and the child must point out the inconsistency. 

4. General Knowledge 

a) Ex: pictures of sun, orange and football - "Take 

the yellow crayon (tester gives child the correct 
crayon) and color the one that should be yellow. 

b) pictures of car, bicycle and top - "Mark the one 
that is most expensive." 

5. a) symbol identification - recognition of letters 

Ex: Mark one 

F: S T (F) K 

b) phonetics 

Ex: picture of ball , light and tree - "Mark the 

picture that starts with the same sound as 
boat ." 

I 

6. Sequencing ~ Items here are mainly picture stories cut 
into 3 or more stages and child must arrange these in 
the correct order. Some are reversible. One item 
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shows a child building a tower if done one way, and 
taking it down if done another. In this case the 
child must specify what is happening. Some items 
that are set up as sequences fall under IV-6, or 7 
(Spatial Projection or relationships) 

III. NUMERICAL - This category should not include items such as, 
"How many pennies in a nickel?" which fall under Verbal, 
General Knowledge, but items which require only a knowledge 




of numbers and number concepts. 

1. a) number - symbol identification - items which 
require knowledge of printed number symbols 
(1, 2, 3, etc.) 

b) number identification - (should probably be under 
counting) — items which demand knowledge of names 
and numbers. Ex: Tester holds up 3 fingers and 

asks, "How many is this?" 

2 * Numbe r Manipulation - direct addition, subtraction, 

etc. Ex: 2 + 2 is how many? There are few items 

of this type. 

3 - Numerical Reasoning - Number problems which require 

number manipulation. Ex: If one pencil costs 3 cents, 

how much would two pencils cost? 

4. Counting _ counting aloud, handing tester a certain 
number of objects or marking the picture with the 
correct number of items. 



Number Concepts - Items which test for the idea of 
relationships such as more, fewer, half as much, etc. 
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(some confusing items here - Ex: picture of a whole 

sandwich, then three pictures of same sandwich (1) cut 
in half, (2) cut in thirds, (3) cut in fourths. 
Question - how will this sandwich look when it is cut 
once?) - Is this a number concept or is it spatial? 
(These Items were classified as number concepts.) 

IV. SPATIAL - This category contains many items that are usually 
grouped under Performance . They are included here when the 
concepts involve more than physical maturity, muscle coordinat 
tion or speed. 

1. Block Design and Patterning - This is not block 
building, but arrangement according to some precise 
pattern where the only guide is a pattern without 
block division. Items that require the completion of 
a pattern by choosing a matching piece. Items that 
require the cutting or folding of paper to match a 
demonstration model. 

2. Mazes and Puzzles - This category includes all mazes - 
paper and pencil, wood, etc. It also includes 
puzzles of the jigsaw type, puzzles that have only 
one missing piece, formboards, or disentangling two 
fitted pieces (paper-clip type) . 

3. Taxonomies ~ classification according to form, size. 




arrangement, etc. ~ not usage or color. 

4. Copying of Forms - requires child to copy different 
geometric forms 

5. Drawing - includes drawing objects or people without 
a model. (4) could be inql^dt^d under Performance, 



-109- 



bat (5) is relatively independent of drawing skill 
and focuses on inclusion of detail, with relatively 
no emphasis on how well the object is drawn.) 

6. Projection - requires knowledge of behavior of objects 

in space. Ex: Jar half filled with colored water 

standing upright - Task: How will the water look if 

the jar is tilted (demonstrate with empty jar). ? The 
child is given a picture of a tilted jar and asked to 
draw the water in it. 

7. Relationships - items which ask which is farther or 
nearer to X, with pictures graded in size. Which is 
larger — smaller? Which mouse is too large to go 
through this hole? 

. picture Completion (Closure) - items which require 
the child to identify or finish drawing an incomplete 
form or picture. 

V. MEMORY 

1. Auditory Retention 

a) verbal - includes items which require the child 
to carry out an extended series of instruction, 
to repeat a sentence or phrase or to answer ques- 
tions about a story which he has been read (or to 
retell the story) . 

b) numerical - items which require child to repeat a 
series of numbers either as they were called out 
or backward. 




no 
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2. Visual Retention - items which require the child to 
repeat words, numbers or letters that he has seen. 
Items that require the child to draw a form which he. 
has been shown briefly - or items that require the 
child to imitate an action. 
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Test Selection and Battery Construction 



(a) 



In order to pursue an extensive investigation of problem solving 
behavior it was necessary to obtain as wide a selection o pro ® 
as ^possible. The utilization of available instruments was considered 
the P most efficient method of obtaining appropriate cognitive and mot 

All available tests and procedures for measuring cognitive de- 
velopment and psychomotor skills in children from three to seven 
years of age were obtained and reviewed. Each instrument was ju ge 

according to the following criteria: 

S Relevance of Content: Items had to measure some problem 

solving ability, either cognitive or motor. Social 
maturity scales, for example, were considered outside the 
realm of cognitive development, as were projectives or _ 
other instruments designed to measure personality - socia 

emotional variables. . , 

Physical characteristics: Each instrument chosen had to be 

appropriately designed for the designated age group. For- 
mat, picture size, and item characteristics were major e 
terminants for the inclusion of instruments. 

Type of Test: As wide a range of testing items as possible 

was desired, tests which included a variety of items or 
tests which presented items in an unusual way (e.g., Arthur 
stencil design) were preferred. 

An item classification technique was developed as the tests 
were reviewed. Each test item was classified according to content 
and format so that the item classification of a test ® 

profile against which other tests could be compared.. In this manner 
twenty-two tests were selected for the study. Some were selec e 
for their wide range of items (e.g-. The Binet & WPPSI) some 
their unusual format (e.g., the Leiter) and some to test for specific 
abilities not adequately covered by broad general instruments (e.g., 

Frostig^ign ^ thg study re quired each child to be tested within 
a one-month period in order to avoid the contaminating factor of 
maturity. Other factors, such as fatigue, maintenance of 
and learning as a result of being tested made it nec -ssary.that 
child not be tested too frequently nor be given too many similar 

The tests therefore, were organized into four batteries each 
of which was to be administered to one-fourth of the total sample- 

« /-'i «-» hacoH nn CPVP' 



Cb) 



(c) 



The 
factors : 

(a) 



(b) 
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assignment of the tests into batteries was based on several 

Content: Each battery was to have approximately the same 

content. This was possible only to a limited extent. 
Wherever feasible each battery contained number items, 
vocabulary items, spacial relations tasks, etc. 

Format: Each Battery was as varied (within itself) as 

possible, verbal and non-verbal tasks, different item 
characteristics and types of tests were all taken into 
consideration. (For the sake of efficient administra- 
tion the tests themselves could not be broken down m 
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order to assign some items to one battery and some items 
to one battery and some items to another. This was done 
only in one case. It was necessary, therefore, that a 
battery be varied by the tests it included rather than 
by items from different instruments.) 

(c) Testing Time: Tests were assigned to batteries in approx- 

imately equal time units, about four hours for each battery. 
In order that some basis for relating items across batteries 
would exist, two tests were designated as "anchor” tests. These 
were chosen for their wide range of content and different item 
types. The anchor battery was composed of the Stanf ord-Bine t 
Intelligence Scale (through year VII) and the Wechsler Pre-School 
and Primary Scale of Intelligence (WPPSI) . The color items from 
the Caldwell Scale Pre-School Inventory were administered with the 
WPPST The anchor battery was administered to each child in the 
sample followed by one of the four tests batteries. 
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Schedule of Research Tests Administered 



Anchor Tests: 



Battery I : 



Battery II: 



Battery III: 



Stanford-Binet. Intelligence Scale (1960) 

Wechsler Preschool and Primary Scale of Int 
(1966) 

SRA Primary Mental Abilities (1953) 

Preschool Inventory, Caldwell and Soule (19 

Frostig Developmental Tests of Visiaal Perce 
(1961) 

Columbia Mental Maturity Scale (1959) 

Let’s Look at First Graders (adapted for re 
purposes) (logical reasoning) 

Illinois Test of Psycholinguistic Abilities 
Raven Progressive Matrices for Children (19 
Winterhaven Perceptual Forms (1967) 

Let’s Look at First Graders (Mathematics) 

Minnesota Preschool Scale (1940) 

Merrill Palmer Scale (1931) 

Arthur Point Scale of Performance Tests (1- 

Arthur Adaptation of Leiter International I 
formance Scale (1948) 

Let’s Look at First Graders (time concepts 

Metropolitan Readiness Test (1943) 

Culture Free Intelligence Test (1950) 
Peabody Picture Vocabulary (1959) 
Goodenough-Harris Drawing Test (1963) 

Let’s Look at First Graders (Spatial relat 
Oseretsky Tests of Motor Proficiency 



Battery IV: 
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A person or persons will be appointed by the Field 4. This will facilitate good scheduling of 

Administrator to supervise children who are waiting to children to be tested, prevent unnecessar; 

be tested. This may be a teacher or other employee of loss of time, and keep children quiet, 

the school or center where the testing is to take place. happy, and occupied. 
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Appendix E 

Procedures Used in the 
Selection and Training of 
Testers 
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TESTER SELECTION 

The initial complement of testers employed for the summer of 1967 con- 
sisted primarily of graduate students majoring in either education or psy- 
chology. These persons were screened before employment with particular 
attention to their educational background, their experience in testing and 
their experience with young children. They were trained in maximum perfor- 
mance testing under the supervision of a member of the Committee on Educa- 
tional Research. Each tester was observed in practice situations with 
children and no testers were allowed to participate in data collection until 
they were judged competent to administer the tests in this manner. These 
testers collected data solely within the Southern disadvantaged suopopulation 

In the fall of 1967 other tasters were employed on a full-time, perma- 
nent basis. All applicants were interviewed by the projects coordinator 
employed by the Committee on Educational Research. Applicants who met the 
qualifications and who were deemed to have a high probability of becoming 
competent testers were referred to the Dean of the School of Education for 
a final interview. This general format was followed in the selection of 
all testers other than those employed during the summer of 1967. 

In the spring of 1968 testing operations were extended into Pittsburgh, 
Pennsylvania, the location of the Northern subpopulation samples. At that 
time the projects coordinator was preparing to accept other employment; he 
and his successor-elect along with the assistant director of the Head Start 
Evaluation project visited Pittsburgh to screen applicants for the position 
of tester. Those applicants selected for employment were then interviewed 
by the Dean of the School of Education prior to final employment. 

Additional testers were interviewed by the assistant project director, 
the director of research operations (formerly projects coordinator), two 
members of the tester training staff and the Dean of the School of Education 

12.4 
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Thus, with each successive group of testers, the screening procedures were 
refined in such a way as to afford maximum exposure of applicants to project 
officials. 

Criteria in addition to the minimum job qualifications included experience 
with young children, educational background and vocational experience. 
Furthermore, each tester who was employed was required to possess an automobile 
and to be willing to travel rather extensively for the purpose of data collec- 
ti on . 



TRAINING OF TESTERS 




Testers were trained in groups of three to eight throughout the project. 
After the first summer of testing it was decided that females between twenty 
and thirty years of age were best suited for testing pre-school children. 

From experience, it was found that older women found it difficult to estab- 
lish adequate rapport and had a tendency to "teach" rather than test. Men 
were sometimes intimidating to young children, particularly deprived children 
who are unaccustomed to white males except in authoritarian roles such as 
policemen. 

Training for a group of testers required from two to four weeks, depend- 
ing on the amount of materials to be learned and the size of the group in 
training. A tyoical training session began with a half day orientation in 
which the testers were told about the research project and the part they 
would be expected to play. Each tester was given all the materials she would 
nsed to administer her battery. The testers were expected to learn the tests 
throughly before children were tested. order to familiarize themselves 
with the item, the testers tested each other. An instructor went over each 
test, item by item, with the testers, explaining what information each item 
attempted to elicit, the purpose of each Item and the type of responses an 
examiner might expect. 125 
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The testers were instructed on the basic differences (in testing situa- 
tions) between deprived ar.d middle class children, and in so far as possible, 
how to handle difficult situations such as temper tantrums, withdrawal or 
hyperactivity. Testers observed demonstrations of the tests being given by 
experienced testers. Finally, children from local Head Start centers and 
middle class children from private kindergartens or public schools were used 
in the training. A meeting was held after each administration so that the 
instructor could point out errors, answer questions, and discuss children's 

responses . 

When the testers had mastered the materials and achieved satifactory 
techniques of dealing with children they were observed by one or more staff 
members from the quality controll staff. Once passed by the quality control 
staff they were observed for final certification by a clinical psychology 
di pi ornate. After a tester received her final certification she was required 
to practice in the field under "actual" field conditions for one to two weeks 
before being permitted to gather data for the investigation. 

Refresher training was required whenever a tester had not administered 
a particular test for more than two weekr . Each tester was observed in the 
field by a member of the quality control staff approximately every two weeks. 

Constant observation, refresher training and the elaborate original train- 
ing were made necessary by the approach to testing which was used. "Maximum 
Performance Testing" is not a standardized approach. It was imperative, 
therefore, to make certain the testers maintained consistant techniques in the 

presentation and probing of each item. 

TRAIN ING PROCEDURES 

A. Overall introduction to project, what we are trying to do and why, 
including theory, data analysis and tests involved. 

B. Assignment of specific tests to be learned. 

126 
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II. A. Trainer will go over tests with tester, item by item if necessary. 

B. Tester will give test to trainer or to another tester with trainer 

present. 

III. A. Tester will test child with trainer present. 

B. Tester will test second child with trainer and observer present 

tester will be rated by both. 

Each tester must reach established performance criteria according to both trainer 
and observer. A tester will be allowed a third testing session with a child 
to reach performance criteria, if his rating is not acceptable after this 
session, he will not be employed as a tester. 

RETRAINING OF TESTERS 

If a tester needs to be trained on additional tests, the following proce- 
dures will be followed. 

I. A. Overview of tests to be learned. 

B. Assignment of test materials. 

C. Tester studies materials at home. 

II. A. Trainer will go over test with tester, item by item if necessary. 

B. Tester will test child with trainer and observer present - the tester 

will be rated by both. 

A tester will be allowed a second session with a child in order to reach per- 
formance criteria. 

No tester will go into the field without having met performance criteria accord- 
ing to both the trainer and an observer. 
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SAMPLE TRAINING SCHEDULE 
Training Agenda 

Binet/WPPSI/Battery I - Sept. 1 - Sept. 15 



Friday, Sept. 1, 1967 

9:00 - 12:30 Overall introduction to project. 

What we are trying to do and why, 

i 

including theory, data analysis, 
and tests involved. 

Assignment of specific tests to be learned. 
When and how the training sessions 
are to be conducted. (This includes: 

a) discussion of tests, b) demonstra- 
tion of some tests, c) practice among 
testers, and, d) practice with child- . 
ren . ) 

Explanation of evaluation procedure for 
all testing performances. 

Distribution of test materials: 

a) Check kits for completeness. 

b) Binet and Battery I handouts. 

c) Testers 1, 2, and 3 will receive 
Binet and Frostig materials. 

d) Testers 4, 5, and 6 will receive 
Binet and PMA materials. 

Meet with Mr. Porter. 



Tuesday, Sept. 5, 1967 

9:00 - 11:00 

O 

ERIC 
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Discussion of Binet items - to be gone over 
item by item if necessary. Answer all 
questions . 
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11:00 - 12:30 
12:30 - 1:00 
2:30 - 3:30 

3:30 - 4:30 
4:30 - 5:00 

Wednesday, Sept. 6, 1967 
9:00 - 11:00 



11:00 - 11:30 
11:30 - 12:30 

1:30 - 3:00 
3:00 - 4:00 



4:00 - 5:00 

Thursday, Sept. 7, 1967 
9:00 - 12:15 



12:15 - 12:30 
1:30 - 3:00 
3:00 - 4:00 
4:00 - 4:30 
Friday, Sept. 8, 1967 
9:00 - 10:30 

10:00 - 11:30 

O 

erJc 12:30 ■ 2:00 
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Demonstration of Binet test. 

Additional questions about Binet. 

Discussion of Frostig and PMA with testers 
so designated. 

Demonstration of Frostig and PMA. 

Question Period 

Administration of Binet among testers. 

(3 groups of 2 testers - evaluators watch- 
ing - each tester is to act as an and an £. 
Questions and rest break 
Administration among testers of Frostig 
and PMA. 

Administration of Binet. 

Administration of Frostig. 

Administration of PMA. 

Question period. 

Binet testing with children. Each tester 
giving two tests. 

Question period. 

Binet testing 
Frostig and PMA testing. 

Question period. 

Binet testing. 

Frostig and PMA testing. 

Distribution of test materials 
a) Testers 1, 2, and 3 receive WPPSI, 
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Monday, Sept. 11, 1967 
9:00 - 11:00 

11:00 - 1:00 

] : 00 - 3:00 

3:00 - 4:00 

4:00 - 4:30 

Tuesday, Sept. 12, 1967 
9:00 - 11:00 

11:00 - 11:30 
11:30 - 12:30 



1:30 - 3:00 
3:00 - 4:00 




4:00 - 5:00 

Wednesday, Sept. 13, 1967 
9:00 - 10:30 



10:45 - 12:15 
12:15 - 12:30 
1:30 - 3:00 




b) Testers 4, 5, and 6 receive WPPSI, 

Columbia Mental Maturity and LLFG^Y). 

Discussion of WPPSI - to be gone over item 
by item if necessary. 

Demonstration of WPPSI and additional 
questi ons . 

Discussion of Caldwell and LLEG(X). 

Discussion of Columbia and LLFG(Y). 
Demonstration of Caldwell and LLFG(X). 
Demonstration of Columbia and LLFG(Y) . 
Additional questions. 

Practice administration of WPPSI among 
testers . 

Question period. 

Practice adminif tion of Caldwell and 
LLFG(X) . 

Practice adminir ration of Columbia and 
LLFG(Y) . 

Administration of WPPSI. 

Administration of Caldwell and LLFG(X). 
Administration of Columbia and LI.FG(Y). 
Additional question period. 

WPPSI testing. 

WPPSI testing. 

Question period. 

WPPSI testing. 



-130- 

- 8 - 



3:00 - 4:00 
4:00 - 5:00 

Thursday, Sept. 14, 1967 
9:00 - 10:30 
10:30 - 11:30 

11:30 - 12:00 



Friday, Sept. 15, 1967 
9:00 



Caldwell and LLFG(X) testing. 

Columbia and LLFG(Y) testing. 

Question period. 

WPPSI testing. 

Caldwell and LLFG(X) testing. 

Columbia and LLFG(Y) testing. 

Review of testing procedure, including the 
correct way to complete answer sheets, order 
tests are to be administered, etc. 

Additional training of testers as needed. 
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SAMPLE TRAINING SCHEDULE 
SCHEDULE FOR TRAINING NEW TESTERS 
October 14 - October 25, 1968 



Monday, 


October 


14 


LOCATION: 

Location - Conference Room 


9:00 - 


10:00 




Organizational Orientation 


10:00 - 


11:30* 




Meeting with Mr. Statler 


1:00 - 


5:00 




Orientation 


Tuesday 


October 15 


Location - Conference. Room 


9:00 - 


12:00* 




Orientation 


1:00 - 


5:00 




Distribution and item by item discussion of 
Metropolitan, LLFG-X, Peabody and Binet. 


Wednesday, October 16 




9:00 - 


10:00 




Discussion and final instructions. 


10:00 - 


11:00 




First Administration of LLFG-X, Met. I and Binet 


11:00 - 


12:00* 




Discussion 


1:00 - 


1 : 45 




Second Administration of LLFG-X, Met. I 


1:00 - 


2:15 




Second Administration of Binet 


1:45 - 


2 15 




Discussion for Battery IV testers 


2:15 - 


2:45 




Discussion for Binet/WPPSI testers 


2:15 - 


3:00 




First Administration of Met. II/Peabody 


2:45 - 


4:00 




Third Administration of Binet 


3:00 - 


3 : 30 




Discussion for Battery IV testers 


3:30 - 


4:00 




Second Administration of Met. II/Peabody 


4:00 - 


5:00 




Discussion and distribution of WPPSI and Oseretsky 


*At the end of 


this session, break for lunch. 



O 

ERIC 
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Schedule for Training New Testers 
Oct. 14 - Oct. 25, 1968 
Page 2 



Thursday, October 17 
9:00 - 12:00* 

9:00 - 10:00 
10:00 - 10:30 
10:30 - 11:30 
11:30 - 12:00* 



1:00 - 


2:00 


2:00 - 


3:00 


3:00 - 


5:00 


3:00 - 


4:00 


4:00 - 


5:00 


Friday, 


October 18 


9:00 - 


10:00 


10-00 - 


10:30 


10:30 - 


11:30 


11:30 - 


12:30* 



1:30 - 2:30 

2:30 - 3:00 

3:00 - 4-: 00 

4:00 « 



O 

ERIC 



Item by item discussion and inter-tester 
administration of Oseretskv. 

Fourth Administration of Binet 

Discussion 

Fifth Administration of Binet 
Discussion 

Sixth Administration of Binet, First Administra- 
tion of Oseretsky 

Discussion 

Distribution and item by item discussion of 
WPPSI 

Second Administration of Oseretsky 

Discussion and Distribution of Culture Fair 
and LLFG-Y 



First Administration of WPPSI: Item by item 
discussion of Culture Fair and LLFG-Y s 

Discussion 

Second Administration of WPPSI: Third Admintstra 
tion of Oseretsky 

Discussion 

Third Administration of WPPSI: First Adminis- 
tration of Culture Fair/LLFG-Y 



Discussion 



Fourth Administration of WPPSI; 
tration of Culture Fair/LLFG-Y 
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Second Adminis- 



5:00 



Discussion 



Appendix F 

Instruments used in Routine Evaluation of Testers 
and Testing Situations with Sample Comments 
from Quality Control Observers 
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